INFO 1998: Introduction to Machine Learning Lecture 9: Clustering - PowerPoint PPT Presentation

INFO 1998: Introduction to Machine Learning

Lecture 9: Clustering and Unsupervised Learning INFO 1998: Introduction to Machine Learning “If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. ” Yan Lecun, Facebook Director of AI research

Recap: Supervised Learning The training data you feed into your algorithm includes desired solutions ● Two types you’ve seen so far: regressors and classifiers ● In both cases, there are definitive “answers” to learn from ● Example 2: Classifier Example 1: Regressor Predicts label Predicts value

Recap: Supervised Learning Supervised learning algorithms we have covered so far: k-Nearest Neighbors ● Perceptron ● Linear Regression ● Logistic Regression ● Support Vector Machines ● Decision Trees and Random Forest ●

What’s the main underlying limitation of supervised learning?

Today: Unsupervised Learning In unsupervised learning, the training data is unlabeled ● Algorithm tries to learn by itself ● An Example: Clustering

Unsupervised Learning Some types of unsupervised learning problems: Clustering 1 k-Means, Hierarchical Cluster Analysis (HCA), Gaussian Mixture Models (GMMs), etc. Dimensionality Reduction 2 Principal Component Analysis (PCA), Locally Linear Embedding (LLE) Association Rule Learning 3 Apriori, Eclat, Market Basket Analysis More …

Cluster Analysis

Cluster Analysis Loose definition: Clusters have objects which are “similar in some way” (and ● “dissimilar to objects in other clusters) Clusters are latent variables ● Understanding clusters can: ● - Yield underlying trends in data - Supply useful parameters for predictive analysis - Challenge boundaries for pre-defined classes and variables

Why Cluster Analysis? Real life example: Recommender Systems A Bunch of Cool Logos

Running Example: Recommender Systems Use 1: Collaborative Filtering “People similar to you also liked X” ● Use other’s rating to suggest content ● Pros Cons If cluster behavior is clear, Computationally expensive can yield good insights Can lead to dominance of certain groups in predictions

Running Example: Recommend MOVIES +

Running Example: Recommender Systems Use 2: Content filtering “Content similar to what YOU are viewing” ● Use user’s watch history to suggest content ● Pros Cons Recommendations made by Limited in scope and applicability learner are intuitive Scalable

Another Example: Cambridge Analytica Uses Facebook profiles to build psychological profiles, ● then use traits for target advertising Ex. has personality test measuring openness, ● conscientiousness, extroversion, agreeableness and neuroticism -> different types of ads

How do we actually perform this “cluster analysis”?

Popular Clustering Algorithms Hierarchical Gaussian k-Means Cluster Analysis Mixture Models Clustering (HCA) (GMMs)

Defining ‘Similarity’ How do we calculate proximity of different datapoints? ● Euclidean distance: ● Other distance measures: ● Squared euclidean distance, manhattan distance ○

Algorithm 1: Hierarchical Clustering Two types: Agglomerative Clustering ● Creates a tree of ○ increasingly large clusters (Bottom-up) Divisive Hierarchical Clustering ● Creates a tree of ○ increasingly small clusters (Top-down)

Agglomerative Clustering Algorithm Steps: ● - Start with each point in its own cluster - Unite adjacent clusters together - Repeat Creates a tree of increasingly large ● clusters

Agglomerative Clustering Algorithm How do we visualize clustering? Using dendrograms Each width represents distance between ● clusters before joining Useful for estimating how many clusters ● you have The iris dataset that we all love

Demo 1

Popular Clustering Algorithms Hierarchical Gaussian k-Means Cluster Analysis Mixture Models Clustering (HCA) (GMMs)

Algorithm 2: k-Means Clustering Input parameter: k Starts with k random centroids ➢ Cluster points by calculating distance ➢ for each point from centroids Take average of clustered points ➢ Use as new centroids ➢ Repeat until convergence ➢ Interactive Demo : http://stanford.edu/class/ee103/visualizations/kmeans/kmeans.html

Algorithm 2: k-Means Clustering A greedy algorithm ● Disadvantages: ● Initial means are randomly selected which can cause suboptimal partitions ○ Possible Solution : Try a number of different starting points Depends on the value of k ○

Demo 2

Coming Up Assignment 9 : Due at 5:30pm on May 6th, 2020 • Last Lecture : Real-world applications of machine learning ( May 6th, 2020 ) • Final Project Proposal Feedback Released • Final Project: Due on May 13th, 2020 •

INFO 1998: Introduction to Machine Learning Lecture 9: Clustering - PowerPoint PPT Presentation

INFO 1998: Introduction to Machine Learning Lecture 9: Clustering and Unsupervised Learning INFO 1998: Introduction to Machine Learning If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the

INFO 1998: Introduction to Machine Learning Lecture 3: Data Visualization INFO 1998: Introduction

INFO 1998: Introduction to Machine Learning Lecture 10: Real-World Applications of Data Science

Why Transformers Work. More info blablabla More info blablabla More info blablabla More

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Responsible Machine Learning INFO-4604, Applied Machine Learning University of Colorado Boulder

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Introduction to Machine Learning Lecture 1 Introduction to Machine Learning September 2, 2015

10601 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview

November 2013 Agenda AGM Forum Meeting - Decommissioning Petrol Stations Kirsten Hotchkiss,

LOCAL FOOD IN RETAIL - TWO MODELS, ONE GOAL - Presentation Outline Technical Orientation

Town Hall Meeting April 1, 2015 Agenda Introduction Bob Teschke Financial Update

Stability of Cluster Analysis 2. Preparation of the data 3. Distance measure used S T A T I S

Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science

Data Mining: Concepts and Techniques Cluster Analysis Li Xiong Slide credits: Jiawei Han and

Alternative Clusterings: Current Progress and Open Challenges James Bailey Department of

INFO 1998: Introduction to Machine Learning Lecture 9: Clustering - PowerPoint PPT Presentation

INFO 1998: Introduction to Machine Learning Lecture 9: Clustering and Unsupervised Learning INFO 1998: Introduction to Machine Learning If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the

INFO 1998: Introduction to Machine Learning Lecture 3: Data Visualization INFO 1998: Introduction

INFO 1998: Introduction to Machine Learning Lecture 10: Real-World Applications of Data Science

Why Transformers Work. *More info blablabla *More info blablabla *More info blablabla *More

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Responsible Machine Learning INFO-4604, Applied Machine Learning University of Colorado Boulder

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Introduction to Machine Learning Lecture 1 Introduction to Machine Learning September 2, 2015

10601 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview

November 2013 Agenda AGM Forum Meeting - Decommissioning Petrol Stations Kirsten Hotchkiss,

LOCAL FOOD IN RETAIL - TWO MODELS, ONE GOAL - Presentation Outline Technical Orientation

Town Hall Meeting April 1, 2015 Agenda Introduction Bob Teschke Financial Update

Stability of Cluster Analysis 2. Preparation of the data 3. Distance measure used S T A T I S

Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science

Data Mining: Concepts and Techniques Cluster Analysis Li Xiong Slide credits: Jiawei Han and

Alternative Clusterings: Current Progress and Open Challenges James Bailey Department of

Why Transformers Work. More info blablabla More info blablabla More info blablabla More