Statistical Machine Learning Lecture 07: Clustering and Evaluation - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 07: Clustering and Evaluation Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 1 / 38

Today’s Objectives Make you understand how to find meaningful groups of data points and evaluating the performance of estimators Covered Topics: Clustering Bias & Variance Cross-Validation K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 2 / 38

Outline 1. Clustering 2. Evaluation 3. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 3 / 38

1. Clustering Outline 1. Clustering 2. Evaluation 3. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 4 / 38

1. Clustering Clustering We introduced mixture models as part of density estimation They are also very useful for clustering Divide the feature space into meaningful groups Find the group assignment 100 100 90 90 80 80 70 70 60 60 50 50 40 40 1 2 3 4 5 6 1 2 3 4 5 6 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 5 / 38

1. Clustering Clustering Clustering is a type of Unsupervised Learning Examples k-Means Mixture models K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 6 / 38

1. Clustering Simple Clustering Methods Agglomerative Clustering Make each point a separate cluster While the clustering is not satisfactory Merge the two clusters with the smallest inter-cluster distance Divisive Clustering Construct a single cluster containing all points While the clustering is not satisfactory Split the cluster that yields the two components with the largest inter-cluster distance K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 7 / 38

1. Clustering Mean Shift Clustering Mean shift is a method for finding modes in a cloud of data points where the points are most dense [Comaniciu & Meer, 02] K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 8 / 38

1. Clustering Mean Shift Clustering The mean shift procedure tries to find the modes of a kernel density estimate through local search [Comaniciu & Meer, 02] The black lines indicate various search paths starting at different points Paths that converge at the same point get assigned the same label K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 9 / 38

1. Clustering Mean Shift Clustering Start with kernel density estimate N �� 2 � � 1 x − x i ˆ � � � f ( x ) = k � � Nh d h � � i = 1 We can derive the mean shift procedure by taking the gradient of the kernel density estimate For details see: D. Comaniciu, P. Meer, Mean Shift: A Robust Approach toward Feature Space Analysis , IEEE Trans. Pattern Analysis Machine Intell., Vol. 24, No. 5, 603-619, 2002. K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 10 / 38

1. Clustering Mean Shift Clustering Start at a random data point x Compute the mean shift vector: �� 2 � � N � x − x i � i = 1 x i g h m h , g ( x ) = � 2 � − x �� N � x − x i � i = 1 g h Where g ( y ) = − k ′ ( y ) Move the current point by the mean shift vector: x ← x + m h , g ( x ) Repeat until convergence K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 11 / 38

1. Clustering Mean Shift Clustering - Illustration Intuitive Description Region of interest Center of mass Mean Shift vector Objective : Find the densest region [Ukrainitz & Sarel] K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 12 / 38

1. Clustering Segmentation using Clustering Clustering of simple image features, e.g. color & pixel position � � � � � [Comaniciu & Meer, 02] K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 13 / 38

2. Evaluation Outline 1. Clustering 2. Evaluation 3. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 14 / 38

2. Evaluation Evaluation What have we seen so far... Classification using the Bayes classifier p ( C k | x ) ∝ p ( x | C k ) p ( C k ) Probability density estimation to estimate the class-conditional densities p ( x | C k ) How do we know how well we are carrying out each of these tasks? We need a way of performance evaluation For density estimation (or really, parameter estimation) For the classifier as a whole K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 15 / 38

2. Evaluation Evaluation Overfitting is everywhere Is there a tank in the picture? [DARPA Neural Network Study (1988-89), AFCEA International Press] K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 16 / 38

2. Evaluation Test Error vs Training Error Error on Test Set Error on Test Set Underfitting 0.3 0.3 ) 2 ) 2 ( x i ) ( x i ) ( y i ( y i Small N N i i 2 N 2 N Mean Squared Error 1 Mean Squared Error 1 0.1 0.1 training error good Optimal Optimal Polynomial Polynomial Degree 5 Degree 5 Training Error Training Error Test Error Test Error model? 0.0 0.0 0 3 6 9 12 0 3 6 9 12 Polynomial Degree n Polynomial Degree n Error on Test Set Error on Test Set Underfitting Overfitting Underfitting About right Overfitting ⇒ NO! We 0.3 0.3 ) 2 ) 2 ( x i ) ( x i ) need to ( y i ( y i N i N i Mean Squared Error 1 2 N Mean Squared Error 1 2 N rethink 0.1 0.1 model Optimal Optimal Polynomial Polynomial selection! Degree 5 Degree 5 Training Error Training Error Test Error Test Error 0.0 0.0 0 3 6 9 12 0 3 6 9 12 Polynomial Degree n Polynomial Degree n K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 17 / 38

2. Evaluation Occam’s Razor and Model Selection William of Ockham (1285-1347) Model Selection Questions Number of parameters a.k.a. degree of polynomial n ? Is your model class sufficently rich? ⇒ Underfitting Too rich? ⇒ Overfitting Occams’s Razor : Always choose the simplest model that fits the data Simplest = smallest model complexity! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 18 / 38

2. Evaluation Bias and Variance As we saw before, maximum likelihood is just one possible way to estimate a parameter How can we assess how good an estimator is? Assume that we have an estimator ˆ θ that estimates the parameter θ from the data set X Bias of an estimator: expected deviation from the true parameter � � � � ˆ ˆ bias θ = E X θ ( X ) − θ Variance of an estimator: expected squared error between the estimator and the mean estimator �� 2 � � � � ˆ ˆ ˆ var θ = E X θ ( X ) − E X θ ( X ) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 19 / 38

2. Evaluation Bias of an Estimator The estimate ˆ θ ( X ) is a random variable, because we assumed that X is a random sample from a true underlying distribution An estimator is biased if the expected value of the estimator � � ˆ E X θ ( X ) differs from the true value of the parameter θ � � ˆ Otherwise it is called unbiased, i.e., E X θ ( X ) = θ K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 20 / 38

2. Evaluation Variance of an Estimator Ideally, we want an unbiased estimator with small variance In practice, this is not that easy as we will see shortly K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 21 / 38

2. Evaluation Bias and Variance K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 22 / 38

2. Evaluation I am so BLUE... An estimator with Zero bias Minimum variance is called a Minimum Variance Unbiased Estimator (MVUE) A Minimum Variance Unbiased Estimator which is Linear in the features is called a Best Linear Unbiased Estimator (BLUE) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 23 / 38

2. Evaluation Maximum-Likelihood Estimation (MLE) of a Gaussian Remember, the Gaussian has two parameters, the mean µ and the variance σ 2 Let’s compute the bias of the Maximum-Likelihood estimate of the mean of a Gaussian N 1 � ˆ µ ( X ) = x i N i = 1 � � N 1 � E X [ˆ µ ( X ) − µ ] = E [ˆ µ ( X )] − E [ µ ] = E − µ x i N i = 1 � N � N � � 1 − µ = 1 � � = E [ x i ] − µ = 0 µ N N i = 1 i = 1 The MLE of the mean of a Gaussian is UNBIASED K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 24 / 38

Statistical Machine Learning Lecture 07: Clustering and Evaluation - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 07: Clustering and Evaluation Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters Statistical Machine Learning Summer Term 2020 1 / 38 Todays Objectives Make

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Learning grammar(s) statistically Mark Johnson joint work with Sharon Goldwater and Tom Griffiths

Generalization and Simplification in Machine Learning Shay Moran School of Mathematics, IAS

Learnability and models of decision making under uncertainty Pathikrit Basu Federico Echenique

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington

Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin Johnson & Serena Yeung

Universal Artificial Intelligence Marcus Hutter Canberra, ACT, 0200, Australia ANU RSISE NICTA

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Statistical Machine Learning Lecture 07: Clustering and Evaluation - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 07: Clustering and Evaluation Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters Statistical Machine Learning Summer Term 2020 1 / 38 Todays Objectives Make

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Learning grammar(s) statistically Mark Johnson joint work with Sharon Goldwater and Tom Griffiths

Generalization and Simplification in Machine Learning Shay Moran School of Mathematics, IAS

Learnability and models of decision making under uncertainty Pathikrit Basu Federico Echenique

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington

Lecture 3: Loss Functions and Optimization Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung

Universal Artificial Intelligence Marcus Hutter Canberra, ACT, 0200, Australia ANU RSISE NICTA

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin Johnson & Serena Yeung