Machine Learning: Algorithms and Applications Floriano Zini Free - PDF document

21/05/12 ¡ Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 11: 21 May 2012 Unsupervised Learning (cont…) Slides courtesy of Bing Liu: www.cs.uic.edu/~liub/WebMiningBook.html 1 ¡

21/05/12 ¡ Road map n Basic concepts n K-means algorithm n Representation of clusters n Hierarchical clustering n Distance functions n Data standardization n Handling mixed attributes n Which clustering algorithm to use? n Cluster evaluation n Summary Mixed attributes n The distance functions we have seen are for data with all numeric attributes, or all nominal attributes, etc. n In many practical cases data has different types of attributes, from the following 6: q interval-scaled q ratio-scaled q symmetric binary q asymmetric binary q nominal q ordinal n Clustering a data set involving mixed attributes is a challenging problem 2 ¡

21/05/12 ¡ Convert to a single type n One common way of dealing with mixed attributes is to: Choose a dominant attribute type 1. Convert the other types to this type 2. n E.g., if most attributes in a data set are interval-scaled q we convert ordinal attributes and ratio-scaled attributes to interval-scaled attributes q it is also appropriate to treat symmetric binary attributes as interval-scaled attributes Convert to a single type (cont …) n It does not make much sense to convert a nominal attribute or an asymmetric binary attribute to an interval-scaled attribute q but it is frequently done in practice by assigning some numbers to them according to some hidden ordering, e.g., prices of the fruits n Alternatively, a nominal attribute can be converted to a set of (symmetric) binary attributes, which are then treated as numeric attributes 3 ¡

21/05/12 ¡ Combining individual distances n This approach computes individual attribute distances and then combine them n A combination formula, proposed by Gower, is r f f ∑ d δ ij ij f 1 (4) dist ( x , x ) = = i j r f ∑ δ ij f 1 = q The distance dist( x i , x j ) is between 0 and 1 q r is the number of attributes ! 1 if x if and x jf are not missing # # f = q ! ij " 0 if x if or x jf is missing # 0 if attribute f is asymmetric and x if and x jf are both 0 # $ q d ij f is the distance contributed by attribute f , in the range [0,1] Combining individual distances (cont …) n If f is a binary or nominal attribute " $ 1 if x if ! x jf f = d ij # $ 0 otherwise % q distance (4) reduces to equation (3)-lect 10 if all attributes are nominal n the simple matching distance (1)-lect 10 if all attributes are symmetric binary n the Jaccard distance (2)-lect 10 if all attributes are asymmetric n n If f is interval-scaled f = x if ! x jf d ij R f q R f is the value range of f R f = max( f ) ! min( f ) q If all the attributes are interval-scaled, distance (4) reduces to Manhattan distance Assuming that all attributes values are standardized n n Ordinal and ratio-scaled attributes are converted to interval-scaled attributes and handled in the same way 4 ¡

21/05/12 ¡ Road map n Basic concepts n K-means algorithm n Representation of clusters n Hierarchical clustering n Distance functions n Data standardization n Handling mixed attributes n Which clustering algorithm to use? n Cluster evaluation n Summary How to choose a clustering algorithm n Clustering research has a long history q A vast collection of algorithms are available q We only introduced several main algorithms n Choosing the “best” algorithm is challenging q Every algorithm has limitations and works well with certain data distributions q It is very hard, if not impossible, to know what distribution the application data follow The data may not fully follow any “ideal” structure or distribution n required by the algorithms q One also needs to decide how to standardize the data, to choose a suitable distance function and to select other parameter values 5 ¡

21/05/12 ¡ How to choose a clustering algorithm (cont …) n Due to these complexities, the common practice is to run several algorithms using different distance functions 1. and parameter settings carefully analyze and compare the results 2. n The interpretation of the results must be based on q insight into the meaning of the original data q knowledge of the algorithms used n Clustering is highly application dependent and to certain extent subjective (personal preferences) Road map n Basic concepts n K-means algorithm n Representation of clusters n Hierarchical clustering n Distance functions n Data standardization n Handling mixed attributes n Which clustering algorithm to use? n Cluster evaluation n Summary 6 ¡

21/05/12 ¡ Cluster Evaluation: hard problem n The quality of a clustering is very hard to evaluate because q We do not know the correct clusters n Some methods are used q User inspection A panel of experts inspects the resulting clusters and scores them n q Study centroids as spreads q Examine rules (e.g., from a decision tree) that describe the clusters q For text documents, one can inspect by reading The final score is the average of the individual scoring n Manual inspection is labor intensive and time consuming n Cluster evaluation: ground truth n We use some labeled data (for classification) q Assumption: Each class is a cluster n Let the classes in the data D be C =( c 1 , c 2 , … , c k ) q The clustering method produces k clusters, which divides D into k disjoint subsets, D 1 , D 2 , … , D k n After clustering, a confusion matrix is constructed q From the matrix, we compute various measurements: entropy , purity , precision , recall and F-score 7 ¡

21/05/12 ¡ Evaluation measures: Entropy n For each cluster, we can measure the entropy as k " entropy ( D i ) = ! Pr i ( c j )log 2 Pr i ( c j ) j = 1 q Pr i (c j ) : proportion of class c j in cluster D i n The entropy of the whole clustering is k D i ! entropy total ( D ) = D entropy ( D i ) i = 1 q |D i |/|D| is the weight of cluster D i , proportional to its size Evaluation measures: purity n Measures the extent a cluster contains only one class of data ( ) purity ( D i ) = max j Pr( c j ) n The purity of the whole clustering is k D i ! purity total ( D ) = D purity ( D i ) i = 1 q |D i |/|D| is the weight of cluster D i , proportional to its size n Precision, recall, and F-measure can be computed as well q Based on the class that is most frequent in the cluster 8 ¡

21/05/12 ¡ An example We can use the total entropy or purity to compare n different clustering results from the same algorithm q different algorithms q Precision, recall and F-measure can be computed as well for each cluster n The precision of Science in cluster 1 is 0.89, the recall is 0.83, the F-measure is q thus 0.86 A remark about ground truth evaluation n Commonly used to compare different clustering algorithms n A real-life data set for clustering has no class labels q Thus although an algorithm may perform very well on some labeled data sets, no guarantee that it will perform well on the actual application data at hand n The fact that it performs well on some label data sets does give us some confidence of the quality of the algorithm n This evaluation method is said to be based on external data or information 9 ¡

21/05/12 ¡ Evaluation based on internal information n Intra-cluster cohesion (compactness): q Cohesion measures how near the data points in a cluster are to the cluster centroid q Sum of squared error (SSE) is a commonly used measure n Inter-cluster separation (isolation): q Separation means that different cluster centroids should be far away from one another n In most applications, expert judgments are still the key Indirect evaluation n In some applications, clustering is not the primary task, but used to help perform another task n We can use the performance on the primary task to compare clustering methods n For instance, in an application, the primary task is to provide recommendations on book purchasing to online shoppers q If we can cluster shoppers according to their features, we might be able to provide better recommendations q We can evaluate different clustering algorithms based on how well they help with the recommendation task q Here, we assume that the recommendation can be reliably evaluated 10 ¡

21/05/12 ¡ Road map n Basic concepts n K-means algorithm n Representation of clusters n Hierarchical clustering n Distance functions n Data standardization n Handling mixed attributes n Which clustering algorithm to use? n Cluster evaluation n Summary Summary n Clustering is has along history and still active q There are a huge number of clustering algorithms q More are still coming every year n We only introduced several main algorithms. There are many others, e.g., q density based algorithm, sub-space clustering, scale-up methods, neural networks based methods, fuzzy clustering, co-clustering, etc. n Clustering is hard to evaluate, but very useful in practice q This partially explains why there are still a large number of clustering algorithms being devised every year n Clustering is highly application dependent and to some extent subjective 11 ¡

Machine Learning: Algorithms and Applications Floriano Zini Free - PDF document

21/05/12 Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 11: 21 May 2012 Unsupervised Learning (cont) Slides courtesy of Bing

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

MLCC 2015 machine learning applications Francesca Odone ML applications Machine Learning

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

Deep Learning Techniques for Music Generation Reinforcement (7) Jean-Pierre Briot

Who We Are Who We Are Grassroots group of Scientists Economists Business owners

Reinforcement Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 13

Agent-Based Modeling and Simulation Introduction to Reinforcement Learning Dr. Alejandro

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

Learning Agents Overview Learning important aspects Learning in Agents goal, types; individual

Introduction to Machine Learning Lecture 1 Introduction to Machine Learning September 2, 2015