DATA MINING LECTURE 9 The EM Algorithm Clustering Evaluation - PowerPoint PPT Presentation

DATA MINING LECTURE 9 The EM Algorithm Clustering Evaluation Sequence segmentation

CLUSTERING

What is a Clustering? • In general a grouping of objects such that the objects in a group (cluster) are similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster Intra-cluster distances are distances are maximized minimized

Clustering Algorithms • K-means and its variants • Hierarchical clustering • DBSCAN

MIXTURE MODELS AND THE EM ALGORITHM

Model-based clustering • In order to understand our data, we will assume that there is a generative process (a model) that creates/describes the data, and we will try to find the model that best fits the data. • Models of different complexity can be defined, but we will assume that our model is a distribution from which data points are sampled • Example: the data is the height of all people in Greece • In most cases, a single distribution is not good enough to describe all data points: different parts of the data follow a different distribution • Example: the data is the height of all people in Greece and China • We need a mixture model • Different distributions correspond to different clusters in the data.

Gaussian Distribution • Example: the data is the height of all people in Greece • Experience has shown that this data follows a Gaussian (Normal) distribution • Reminder: Normal distribution: 𝑓 − 𝑦−𝜈 2 1 2𝜏 2 𝑄 𝑦 = 2𝜌𝜏 • 𝜈 = mean, 𝜏 = standard deviation

Gaussian Model • What is a model? • A Gaussian distribution is fully defined by the mean 𝜈 and the standard deviation 𝜏 • We define our model as the pair of parameters 𝜄 = (𝜈, 𝜏) • This is a general principle: a model is defined as a vector of parameters 𝜄

Fitting the model • We want to find the normal distribution that best fits our data • Find the best values for 𝜈 and 𝜏 • But what does best fit mean?

Maximum Likelihood Estimation (MLE) • Find the most likely parameters given the data. Given the data observations 𝑌 , find 𝜄 that maximizes 𝑄(𝜄|𝑌) • Problem: We do not know how to compute 𝑄 𝜄 𝑌 • Using Bayes Rule: 𝑄 𝜄 𝑌 = 𝑄 𝑌 𝜄 𝑄(𝜄) 𝑄(𝑌) • If we have no prior information about 𝜄 , or X, we can assume uniform.Maximizing 𝑄 𝜄 𝑌 is the same as maximizing 𝑄 𝑌 𝜄

Maximum Likelihood Estimation (MLE) • We have a vector 𝑌 = (𝑦 1 , … , 𝑦 𝑜 ) of values and we want to fit a Gaussian 𝑂(𝜈, 𝜏) model to the data • Our parameter set is 𝜄 = (𝜈, 𝜏) • Probability of observing point 𝑦 𝑗 given the parameters 𝜄 𝑓 − 𝑦 𝑗 −𝜈 2 1 2𝜏 2 𝑄 𝑦 𝑗 |𝜄 = 2𝜌𝜏 • Probability of observing all points (assume independence) 𝑜 𝑜 𝑓 − 𝑦 𝑗 −𝜈 2 1 𝑄 𝑌|𝜄 = 𝑄 𝑦 𝑗 |𝜄 = 2𝜏 2 2𝜌𝜏 𝑗=1 𝑗=1 • We want to find the parameters 𝜄 = (𝜈, 𝜏) that maximize the probability 𝑄(𝑌|𝜄)

Maximum Likelihood Estimation (MLE) • The probability 𝑄(𝑌|𝜄) as a function of 𝜄 is called the Likelihood function 𝑜 𝑓 − 𝑦 𝑗 −𝜈 2 1 𝑀(𝜄) = 2𝜏 2 2𝜌𝜏 𝑗=1 • It is usually easier to work with the Log-Likelihood function 𝑜 𝑀𝑀 𝜄 = − 𝑦 𝑗 − 𝜈 2 − 1 2 𝑜 log 2𝜌 − 𝑜 log 𝜏 2𝜏 2 𝑗=1 • Maximum Likelihood Estimation • Find parameters 𝜈, 𝜏 that maximize 𝑀𝑀(𝜄) 𝑜 𝑜 𝜈 = 1 𝜏 2 = 1 𝑜 (𝑦 𝑗 −𝜈) 2 2 𝑜 𝑦 𝑗 = 𝜈 𝑌 = 𝜏 𝑌 𝑗=1 𝑗=1 Sample Mean Sample Variance

Mixture of Gaussians • Suppose that you have the heights of people from Greece and China and the distribution looks like the figure below (dramatization)

Mixture of Gaussians • In this case the data is the result of the mixture of two Gaussians • One for Greek people, and one for Chinese people • Identifying for each value which Gaussian is most likely to have generated it will give us a clustering.

Mixture model • A value 𝑦 𝑗 is generated according to the following process: • First select the nationality • With probability 𝜌 𝐻 select Greece, with probability 𝜌 𝐷 select China (𝜌 𝐻 + 𝜌 𝐷 = 1) We can also thing of this as a Hidden Variable Z that takes two values: Greece and China • Given the nationality, generate the point from the corresponding Gaussian • 𝑄 𝑦 𝑗 𝜄 𝐻 ~ 𝑂 𝜈 𝐻 , 𝜏 𝐻 if Greece 𝜄 𝐻 : parameters of the Greek distribution 𝜄 𝐷 : parameters of the China distribution • 𝑄 𝑦 𝑗 𝜄 𝐷 ~ 𝑂 𝜈 𝐷 , 𝜏 𝐷 if China

Mixture Model • Our model has the following parameters Θ = (𝜌 𝐻 , 𝜌 𝐷 , 𝜈 𝐻 , 𝜏 𝐻 , 𝜈 𝐷 , 𝜏 𝐷 ) 𝜄 𝐻 : parameters of the Greek distribution Mixture probabilities 𝜄 𝐷 : parameters of the China distribution

Mixture Model • Our model has the following parameters Θ = (𝜌 𝐻 , 𝜌 𝐷 , 𝜈 𝐻 , 𝜏 𝐻 , 𝜈 𝐷 , 𝜏 𝐷 ) Mixture probabilities Distribution Parameters • For value 𝑦 𝑗 , we have: 𝑄 𝑦 𝑗 |Θ = 𝜌 𝐻 𝑄 𝑦 𝑗 𝜄 𝐻 + 𝜌 𝐷 𝑄(𝑦 𝑗 |𝜄 𝐷 ) • For all values 𝑌 = 𝑦 1 , … , 𝑦 𝑜 𝑜 𝑄 𝑌|Θ = 𝑄(𝑦 𝑗 |Θ) 𝑗=1 • We want to estimate the parameters that maximize the Likelihood of the data

Mixture Models • Once we have the parameters Θ = (𝜌 𝐻 , 𝜌 𝐷 , 𝜈 𝐻 , 𝜈 𝐷 , 𝜏 𝐻 , 𝜏 𝐷 ) we can estimate the membership probabilities 𝑄 𝐻 𝑦 𝑗 and 𝑄 𝐷 𝑦 𝑗 for each point 𝑦 𝑗 : • This is the probability that point 𝑦 𝑗 belongs to the Greek or the Chinese population (cluster) Given from the Gaussian distribution 𝑂(𝜈 𝐻 , 𝜏 𝐻 ) for Greek 𝑄 𝑦 𝑗 𝐻 𝑄(𝐻) 𝑄 𝐻 𝑦 𝑗 = 𝑄 𝑦 𝑗 𝐻 𝑄 𝐻 + 𝑄 𝑦 𝑗 𝐷 𝑄(𝐷) 𝑄 𝑦 𝑗 𝜄 𝐻 𝜌 𝐻 = 𝑄 𝑦 𝑗 𝜄 𝐻 𝜌 𝐻 + 𝑄 𝑦 𝑗 𝜄 𝐷 𝜌 𝐷

EM (Expectation Maximization) Algorithm • Initialize the values of the parameters in Θ to some random values • Repeat until convergence • E-Step: Given the parameters Θ estimate the membership probabilities 𝑄 𝐻 𝑦 𝑗 and 𝑄 𝐷 𝑦 𝑗 • M-Step: Compute the parameter values that (in expectation) maximize the data likelihood 𝑜 𝑜 𝜌 𝐷 = 1 𝜌 𝐻 = 1 𝑜 𝑄(𝐷|𝑦 𝑗 ) 𝑜 𝑄(𝐻|𝑦 𝑗 ) Fraction of population in G,C 𝑗=1 𝑗=1 𝑜 𝑜 1 1 𝜈 𝐷 = 𝑄 𝐷 𝑦 𝑗 𝑦 𝑗 𝜈 𝐻 = 𝑄 𝐻 𝑦 𝑗 𝑦 𝑗 MLE Estimates 𝑜 ∗ 𝜌 𝐷 𝑜 ∗ 𝜌 𝐻 if 𝜌 ’s were fixed 𝑗=1 𝑗=1 𝑜 𝑜 1 1 2 = 2 = 𝑦 𝑗 − 𝜈 𝐷 2 𝑦 𝑗 − 𝜈 𝐻 2 𝜏 𝐷 𝑄 𝐷 𝑦 𝑗 𝜏 𝐻 𝑄 𝐻 𝑦 𝑗 𝑜 ∗ 𝜌 𝐷 𝑜 ∗ 𝜌 𝐻 𝑗=1 𝑗=1

Relationship to K-means • E-Step: Assignment of points to clusters • K-means: hard assignment, EM: soft assignment • M-Step: Computation of centroids • K-means assumes common fixed variance (spherical clusters) • EM: can change the variance for different clusters or different dimensions (ellipsoid clusters) • If the variance is fixed then both minimize the same error function

CLUSTERING EVALUATION

Clustering Evaluation • How do we evaluate the “ goodness ” of the resulting clusters? • But “ clustering lies in the eye of the beholder ”! • Then why do we want to evaluate them? • To avoid finding patterns in noise • To compare clusterings, or clustering algorithms • To compare against a “ ground truth ”

Clusters found in Random Data 1 1 0.9 0.9 0.8 0.8 0.7 0.7 Random DBSCAN 0.6 0.6 Points y 0.5 0.5 y 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x 1 1 0.9 0.9 0.8 0.8 K-means Complete 0.7 0.7 Link 0.6 0.6 0.5 0.5 y y 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x

Different Aspects of Cluster Validation 1. Determining the clustering tendency of a set of data, i.e., distinguishing whether non-random structure actually exists in the data. 2. Comparing the results of a cluster analysis to externally known results, e.g., to externally given class labels. 3. Evaluating how well the results of a cluster analysis fit the data without reference to external information. - Use only the data 4. Comparing the results of two different sets of cluster analyses to determine which is better. Determining the ‘correct’ number of clusters . 5. For 2, 3, and 4, we can further distinguish whether we want to evaluate the entire clustering or just individual clusters.

DATA MINING LECTURE 9 The EM Algorithm Clustering Evaluation - PowerPoint PPT Presentation

DATA MINING LECTURE 9 The EM Algorithm Clustering Evaluation Sequence segmentation CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related) to one another and

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

LECTURE 8 The EM Algorithm Clustering Validation Sequence segmentation CLUSTERING What is a

Required Tutorial Eiffel Testing Framework (ETF): Automated Regression & Acceptance Testing

Introduction to Programming paradigms different perspectives (to try) to solve problems 17

Jeffrey D. Ullman Stanford University Given a set of points, with a notion of distance

Transparent Assessment Providing transparent goals and expectations for students Jonathon Adams

TSP: operational semantics / department of mathematics and computer science 3/15 / department of

Reconciling Concurrency Theory with Other Branches of Computer Science Hubert Garavel Inria

Brzozowski Goes Concurrent A Kleene Theorem for Pomset Languages e 1 Paul Brunet 1 Bas Luttik 2

Sambuz

Useful Links

Newsletter

Mail Us

DATA MINING LECTURE 9 The EM Algorithm Clustering Evaluation - PowerPoint PPT Presentation

DATA MINING LECTURE 9 The EM Algorithm Clustering Evaluation Sequence segmentation CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related) to one another and

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

LECTURE 8 The EM Algorithm Clustering Validation Sequence segmentation CLUSTERING What is a

Required Tutorial Eiffel Testing Framework (ETF): Automated Regression &amp; Acceptance Testing

Introduction to Programming paradigms different perspectives (to try) to solve problems 17

Jeffrey D. Ullman Stanford University Given a set of points, with a notion of distance

Transparent Assessment Providing transparent goals and expectations for students Jonathon Adams

TSP: operational semantics / department of mathematics and computer science 3/15 / department of

Reconciling Concurrency Theory with Other Branches of Computer Science Hubert Garavel Inria

Brzozowski Goes Concurrent A Kleene Theorem for Pomset Languages e 1 Paul Brunet 1 Bas Luttik 2

Sambuz

Useful Links

Newsletter

Mail Us

Required Tutorial Eiffel Testing Framework (ETF): Automated Regression & Acceptance Testing