2018-02-27 6. Learning Partitions of a Set How to use set partitions? Also known as clustering! } to create classes and then classify examples Usually, we partition sets into subsets with elements } to find outliers in data sets that are somewhat similar (and since similarity is often } to establish what feature values interesting classes task dependent, different partitions of the same set are should have. possible and often needed). In contrast to classification tasks, partitioning is not using given classes, it creates its } to find events that over time occur sufficiently often own classes (although there might be some constraints enough on what is allowed as a class; unsupervised learning). } to find “plays” for a group of agents to help them As such, learning partitions is not just to later classify achieve their goals additional examples, it also is about discovery of what } ... should be classified together! Machine Learning J. Denzinger Machine Learning J. Denzinger Known methods to learn partitions: Comments: } k-means clustering and many improvements/ } All clustering methods have parameters (in addition enhancements of it (like x-means) to the similarity measure) that substantially influence what a partitioning of the set is created } PAM (partitioning around medoids) E quite some literature on how to compare } sequential leader clustering partitionings } hierarchical clustering methods } But similarity is the key parameter and dependent on } conceptual clustering methods what the clustering is aimed to achieve. } fuzzy clustering (allows an example to be in several } Often we use a distance measure instead of similarity clusters with different membership values) (which means we change maximizing to minimizing). } ... Machine Learning J. Denzinger Machine Learning J. Denzinger 6.1 k-means clustering: Learning phase: General idea Representing and storing the knowledge See essentially every text book on Machine Learning. The clusters are represented by their centroids and The basic idea is to use a given similarity (or distance) these k elements (which are described by their values measure and a given number k to create k clusters out of for the features that we have in the examples) are the the given examples (data points) by putting examples that stored knowledge. are similar to each other into the same cluster. Since clusters need to have a center point, we start with k randomly selected center points, create clusters, compute the best center points for each cluster (centroids) and then repeat the clustering with the new center points. This whole process is repeated either a certain number of times, until the centroids do not change, or the quality of the partitioning improvement is below a threshold. Also, usually we do several runs using different initial center points. Machine Learning J. Denzinger Machine Learning J. Denzinger 1
2018-02-27 Learning phase: Learning phase: What or whom to learn from Learning method As for so many learning methods, we learn from The basic algorithm is as follows (using a convergence examples that are vectors of values for features: parameter ε ): ex 1 : (val 11 ,..., val 1n ) Randomly select k elements {m 1 ,...,m k } from Ex; Quality new := 0 ... Repeat ex s : (val s1 ,..., val sn ) For i=1 to k do C i := {} val ij ∈ feat j with feat j being the set of possible values for For all ex ∈ Ex do feature j. This forms the set Ex. assign ex to the C i with sim (m i ,ex) maximal For i=1 to k do m i := 1/|C i | * Σ ex ∈ Ci ex Additionally, we need to be able to define a similarity Quality old := Quality new function sim : (feat 1 x ... x feat n ) 2 -> R (real numbers) Quality new := Σ i=1 k Σ ex ∈ Ci sim (m i ,ex) and sim is provided/selected from the outside. until Quality new - Quality old < ε Machine Learning J. Denzinger Machine Learning J. Denzinger Learning phase: Application phase: Learning method (cont.) How to detect applicable knowledge A key component of the algorithm is the computation By finding the nearest centroid (with regard to sim ) to of the new m i s (“means”, centroids). As one name a given example to classify. suggests, they are supposed to represent the means of the members of a cluster and are computed by determining the mean value of the examples in the cluster for each individual feature. Machine Learning J. Denzinger Machine Learning J. Denzinger Application phase: Application phase: How to apply knowledge Detect/deal with misleading knowledge Assign the example to classify to the cluster Again, not part of the method. User has responsibility represented by the centroid to which it is nearest. to do a re-learning if unsatisfied with current results. Machine Learning J. Denzinger Machine Learning J. Denzinger 2
2018-02-27 General questions: General questions: Generalize/detect similarities? Generalize/detect similarities? (cont.) Obviously, the similarity function sim is responsible for Among the general candidates for similarites for two this. While there are some general candidates for sim , vectors x = (x 1 ,...,x n ) and y = (y 1 ,...,y n ) are often we need specialized functions incorporating Euclidean distance: sqrt( Σ i=1 n (x i -y i ) 2 ) (the smaller the already knowledge from the particular application. better) In general, similarity functions should fulfill Weighted Euclidean distance: sqrt( Σ i=1 n w i *(x i -y i ) 2 ) (for Reflexivity: sim (x,x) = 1 w i weight for feature i; the smaller the better, again). Symmetry: sim (x,y) = sim (y,x) Manhattan distance: Σ i=1 n |x i -y i | (the smaller the better, again) Machine Learning J. Denzinger Machine Learning J. Denzinger General questions: (Conceptual) Example Dealing with knowledge from other sources Very indirectly, by influencing parameters, like the We have two features (with real numbers as values), similarity function (or parameters in it). use Manhattan distance as similarity measure (ties are broken by assigning to the cluster with smaller index) Example: x 2 x 2 and set k=2. We have the following set Ex: X ex 1 : (1,1); ex 2 : (2,1); ex 3 : (2,2); ex 4 : (2,3); ex 4 : (3,3) X X X Select m 1 ,m 2 : (1,1), (3,3) x 1 x 1 C 1 = {(1,1), (2,1), (2,2)} C 2 = {(2,3), (3,3)} The left clustering is the result of using standard New m i s: m 1 =(1.67,1.33); m 2 = (2.5,3) Euclidean distance. The right clustering can be generated using a weighted Euclidean distance with No change in C i s, therefore no change in m i s and weights w 1 =0 and w 2 =1 Quality. Finished. Machine Learning J. Denzinger Machine Learning J. Denzinger (Conceptual) Example Pros and cons New run: ✚ efficient and simple to implement Select m 1 ,m 2 : (2,3), (2,1) - has often problems with non-numerical features - choosing k is not always easy C 1 = {(2,3), (3,3), (2,2)} C 2 = {(2,1), (1,1)} - finds a local optimum, not a global one New m i s: m 1 =(2.33,2.67); m 2 = (1.5,1) - has problems with noisy data and with outliers No change in C i s, therefore no change in m i s and Quality. Finished. Note where (2,2) ends up in each run! Machine Learning J. Denzinger Machine Learning J. Denzinger 3
Recommend
More recommend