K-Means an example of unsupervised learning CMSC 422 M ARINE C - PowerPoint PPT Presentation

K-Means an example of unsupervised learning CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu

When applying a learning algorithm, some things are properties of the problem you are trying to solve, and some things are up to you to choose as the ML programmer. Which of the following are properties of the problem? – The data generating distribution – The train/dev/test split – The learning model – The loss function

T oday’s T opics • A new algorithm – K-Means Clustering • Fundamental Machine Learning Concepts – Unsupervised vs. supervised learning – Decision boundary

Clustering • Goal: automatically partition examples into groups of similar examples • Why? It is useful for – Automatically organizing data – Understanding hidden structure in data – Preprocessing for further analysis

What can we cluster in practice? • news articles or web pages by topic • protein sequences by function, or genes according to expression profile • users of social networks by interest • customers according to purchase history • galaxies or nearby stars • …

Clustering • Input – a set S of n points in feature space – a distance measure specifying distance d(x_i,x_j) between pairs (x_i,x_j) • Output – A partition {S_1,S_2, … S_k} of S

Su Super ervised vised Machine Learning as Function Approximation Problem setting • Set of possible instances 𝑌 • Unknown target function 𝑔: 𝑌 → 𝑍 • Set of function hypotheses 𝐼 = ℎ ℎ: 𝑌 → 𝑍} Input • Training examples { 𝑦 1 , 𝑧 1 , … 𝑦 𝑂 , 𝑧 𝑂 } of unknown target function 𝑔 Output • Hypothesis ℎ ∈ 𝐼 that best approximates target function 𝑔

Supervised vs. unsupervised learning • Clustering is an example of unsupervised learning • We are not given examples of classes y • Instead we have to discover classes in data

2 datasets with very different underlying structure!

The K-Means Algorithm K: number of Training Data clusters to discover

Example: using K-Means to discover 2 clusters in data

K-Means properties • Time complexity: O(KNL) where – K is the number of clusters – N is number of examples – L is the number of iterations • K is a hyperparameter – Needs to be set in advance (or learned on dev set) • Different initializations yield different results! – Doesn’t necessarily converge to best partition • “Global” view of data: revisits all examples at every iteration

Impact of initialization

Questions for you… • Can you think of clusters that cannot be discovered using k-means? • Do you know any other clustering algorithms?

Aside: High Dimensional Spaces are Weird • High dimensional spheres look more like porcupines than balls • Distances between two random points in high dimensions are approximately the same (CIML Section 2.5)

Exercise: When are DT vs kNN appropriate? Properties of classification Can Decision Trees handle Can K-NN handle them? problem them? Binary features yes yes Numeric features yes yes Categorical features yes yes Robust to noisy training no (for default algorithm) yes (when k > 1) examples Fast classification is crucial yes no Many irrelevant features yes no Relevant features have yes no very different scale

What you should know • New Algorithms – K-NN classification – K-means clustering • Fundamental ML concepts – How to draw decision boundaries – What decision boundaries tells us about the underlying classifiers – The difference between supervised and unsupervised learning

K-Means an example of unsupervised learning CMSC 422 M ARINE C - PowerPoint PPT Presentation

K-Means an example of unsupervised learning CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu When applying a learning algorithm, some things are properties of the problem you are trying to solve, and some things are up to you to choose as the ML

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Lecture 23/Chapter 19 Diversity of Sample Means Means versus Proportions Behavior of

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to

K -means Clustering Ke Chen Reading: [7.3, EA], [9.1, CMB] COMP24111 Machine Learning Outline

11/11/2014 Chapter 22 INFERENCES ABOUT MEANS 1 SAMPLING DISTRIBUTION FOR MEANS Recall, the

Chapter 7: The Distribution of Sample Means Frequency 2 1 0 1 2 3 4 5 6 7 8 9 Scores Distribution

A Semantics for Means-End Relations Jesse Hughes Technical University of Eindhoven August 29,

k -means++ seeding Have seen that the k -means algorithm can output arbitrarily poor solutions, if

MacConvilles Surveying BIM What it Means to Quantity Surveying BIM What it Means to

How Tortillas Stack Up in the Baking Industry What is a Tortilla? In Mexico, means little

QSL Card QSL Card A means of providing written confirmation A means of providing written

Fed Forum Personal Bankruptcy Reform of 2005: Means-Testing or Mean-Spirited? Astrid Dick

MEP Means Coordination Jason Richards Peter Martin MEP Means Coordination Western Link,

Sustainable Ocean. Innovation means to come up with new ideas. Sustainable means to keep

together, creating sustainable value PORCUPINE TOUR June 9, 2016 Forward-Looking Statements

Office Hours: COVID-19 Planning and Response June 19, 2020 Housekeeping A recording of

1 Agenda Introductions VD-HCBS National Status Aging and Disability Network Agency

18 September 2018 Andy Wright, IAPT Advisor, Heather Stonebank, Lead PWP Advisor and Sarah

XENONnT and purity monitor S. Moriyama (ICRR & Kavli-IPMU, The Univ. of Tokyo) on behalf of

Come with a Topic, Leave with a Speech Who came with a topic? MODULE 1 Defining Your Purpose

1 & 2 Samuel Series Lesson #127 April 10, 2018 Dean Bible Ministries

Risk Retirement for Marine Renewable Energy Development Andrea Copping Mikaela Freeman Alicia

K-Means an example of unsupervised learning CMSC 422 M ARINE C - PowerPoint PPT Presentation

K-Means an example of unsupervised learning CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu When applying a learning algorithm, some things are properties of the problem you are trying to solve, and some things are up to you to choose as the ML

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Lecture 23/Chapter 19 Diversity of Sample Means Means versus Proportions Behavior of

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to

K -means Clustering Ke Chen Reading: [7.3, EA], [9.1, CMB] COMP24111 Machine Learning Outline

11/11/2014 Chapter 22 INFERENCES ABOUT MEANS 1 SAMPLING DISTRIBUTION FOR MEANS Recall, the

Chapter 7: The Distribution of Sample Means Frequency 2 1 0 1 2 3 4 5 6 7 8 9 Scores Distribution

A Semantics for Means-End Relations Jesse Hughes Technical University of Eindhoven August 29,

k -means++ seeding Have seen that the k -means algorithm can output arbitrarily poor solutions, if

MacConvilles Surveying BIM What it Means to Quantity Surveying BIM What it Means to

How Tortillas Stack Up in the Baking Industry What is a Tortilla? In Mexico, means little

QSL Card QSL Card A means of providing written confirmation A means of providing written

Fed Forum Personal Bankruptcy Reform of 2005: Means-Testing or Mean-Spirited? Astrid Dick

MEP Means Coordination Jason Richards Peter Martin MEP Means Coordination Western Link,

Sustainable Ocean. Innovation means to come up with new ideas. Sustainable means to keep

together, creating sustainable value PORCUPINE TOUR June 9, 2016 Forward-Looking Statements

Office Hours: COVID-19 Planning and Response June 19, 2020 Housekeeping A recording of

1 Agenda Introductions VD-HCBS National Status Aging and Disability Network Agency

18 September 2018 Andy Wright, IAPT Advisor, Heather Stonebank, Lead PWP Advisor and Sarah

XENONnT and purity monitor S. Moriyama (ICRR &amp; Kavli-IPMU, The Univ. of Tokyo) on behalf of

Come with a Topic, Leave with a Speech Who came with a topic? MODULE 1 Defining Your Purpose

1 &amp; 2 Samuel Series Lesson #127 April 10, 2018 Dean Bible Ministries

Risk Retirement for Marine Renewable Energy Development Andrea Copping Mikaela Freeman Alicia

XENONnT and purity monitor S. Moriyama (ICRR & Kavli-IPMU, The Univ. of Tokyo) on behalf of

1 & 2 Samuel Series Lesson #127 April 10, 2018 Dean Bible Ministries