Fuzzy Systems Fuzzy Clustering Rudolf Kruse Christian Moewes {kruse,cmoewes}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge Processing and Language Engineering R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 1 / 76
Outline 1. Fuzzy Data Analysis Representation of a Datum Data Analysis 2. Clustering 3. Basic Clustering Algorithms 4. Distance Function Variants 5. Objective Function Variants 6. Cluster Validity R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 2 / 76
Fuzzy Data Analysis datum: • something given • gets its sense in a certain context • describes the condition of a certain “thing” • carries only information if there are at least two different possibilities of the condition • is seen as the realization of a certain variable of a universe R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 3 / 76
Representation of a Datum • characteristic yes/no: universe consists of two elements • characteristic gradiations: universe (finite), grade (figures) • observations/measurements: universe (Euclidean space) • continuous observations in space or time: universe (Hilbert space), e.g. , spectrogram • gray-shaded images: universe (depends), e.g. , x-ray images • expert opinion: universe (logic), e.g. , statements, facts, rules R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 4 / 76
Data Analysis 1st level • valuation and examination with regard to simple, essential characteristics • analysis of frequency, reliability test, runaway, credibility 2nd level • pattern matching • grouping observations (according to background knowledge, . . . ) • maybe transformation with the aim of finding structures withing data explorative data analysis • examination of data without previously chosen mathematic model R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 5 / 76
Data Analysis 3rd level • analysis of data regarding one or more mathematical models • qualitative • formation relating to additional characteristics expressed by quality • e.g. , introduction of the term of similarity for cluster analysis • quantitative • recognition of functional relations • e.g. , approximation of regression analysis R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 6 / 76
Data Analysis 4th level • conclusion and evaluation of the conclusion • prediction of future or missing data ( e.g. , time line analysis) • data assign to standards ( e.g. , spectrogram analysis) • combination of data ( e.g. , data fusion) • valuation of conclusions • possibly learning from data, model revision problem • what to do in case of vague, imprecise or inconsistent data ⇒ fuzzy data analysis • common data is analyzed with fuzzy methods R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 7 / 76
Outline 1. Fuzzy Data Analysis 2. Clustering 3. Basic Clustering Algorithms 4. Distance Function Variants 5. Objective Function Variants 6. Cluster Validity R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 8 / 76
Clustering • clustering is an unsupervised learning task • goal: divide dataset s.t. both constraints hold • objects belonging to same cluster are as similar as possible • objects belonging to different clusters are as dissimilar as possible • similarity is usually measured in terms of distance function • the smaller the distance, the more similar two data tuples Definition R p × I R p → [ 0 , ∞ ) is a distance function if ∀ x , y , z ∈ I R p : d : I (i) d ( x , y ) = 0 ⇔ x = y (identity), (ii) d ( x , y ) = d ( y , x ) (symmetry), (iii) d ( x , z ) ≤ d ( x , y ) + d ( y , z ) (triangle inequality). R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 9 / 76
Distance Functions Illustration of distance functions • Minkowski family � p � 1 k � ( x d − y d ) k d k ( x , y ) = d = 1 • well-known special cases from this family are k = 1 : Manhattan or city block distance, k = 2 : Euclidean distance, maximum distance, i.e. d ∞ ( x , y ) = max p k → ∞ : d = 1 | x d − y d | k = 1 k = 2 k → ∞ R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 10 / 76
Partitioning Algorithms • here, we only focus on partitioning algorithms • i.e. , given c ∈ I N , find best partition of data into c groups • different from hierarchical techniques, i.e. , organize data in nested sequence of groups • usually number of (true) clusters is unknown • using partitioning methods, however, we must specify c R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 11 / 76
Prototype-based Clustering • we focus on prototype-based clustering algorithms • i.e. , clusters are represented by cluster prototypes C i , i = 1 , . . . , c • prototypes capture structure (distribution) of data in each cluster • set of prototypes C = { C 1 , . . . , C c } • prototype C i is n -tuple which consists of • cluster center c i , and • some additional parameters about size and shape of cluster • prototypes are constructed by clustering algorithms R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 12 / 76
Outline 1. Fuzzy Data Analysis 2. Clustering 3. Basic Clustering Algorithms Hard c-means Fuzzy c-means Possibilistic c-means Comparison of FCM and PCM 4. Distance Function Variants 5. Objective Function Variants R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 13 / 76 6. Cluster Validity
Basic Clustering Algorithms Center Vectors and Objective Functions • consider simplest cluster prototypes, i.e. , center vectors C i = ( c i ) • distance measure d based on inner product, e.g. , Euclidean distance • all algorithms are based on objective functions J • quantify goodness of cluster models • must be minimized to obtain optimal clusters • algorithms determine best decomposition by minimizing J R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 14 / 76
Hard c -means R p is • each data point x j in dataset X = { x 1 , . . . , x n } , X ⊆ I assigned to exactly one cluster ⇒ each cluster Γ i ⊂ X • set of clusters Γ = { Γ 1 , . . . , Γ c } must be exhaustive partition of X into c non-empty and pairwise disjoint subsets Γ i , 1 < c < n • data partition is optimal when sum of squared distances between cluster centers and data points assigned to them is minimal • clusters should be as homogeneous as possible R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 15 / 76
Hard c -means • objective function of the hard c -means: c n � � u ij d 2 J h ( X , U h , C ) = ij i = 1 j = 1 • U = ( u ij ∈ { 0 , 1 } ) c × n is called partition matrix with � 1 , if x j ∈ Γ i u ij = 0 , otherwise • each data point is assigned exactly to one cluster c � u ij = 1 , ∀ j ∈ { 1 , . . . , n } i = 1 • every cluster must contain at least one data point n � u ij > 0 , ∀ i ∈ { 1 , . . . , c } j = 1 R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 16 / 76
Alternating Optimization Scheme • J h depends on c and assignment U of data points to clusters • finding parameters that minimize J h is NP-hard • hard c -means minimizes J h by alternating optimization (AO) 1. parameters to optimize are split into two groups 2. one group is optimized holding the other group fixed (and vice versa) 3. iterative update scheme is repeated until convergence • it cannot be guaranteed that global optimum will be reached • algorithm may get stuck in local minimum R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 17 / 76
AO Scheme for Hard c -means 1. chose initial c i , e.g. , randomly picking c data points ∈ X 2. hold C fixed and determine U that minimize J h • each data point is assigned to its closest cluster center � if i = arg min c 1 , k = 1 d kj u ij = 0 , otherwise • any other assignment would not minimize J h for fixed clusters 3. hold U fixed, update c i as mean of all x j assigned to them • mean minimizes sum of square distances in J h , formally � n j = 1 u ij x j c i = � n j = 1 u ij 4. both steps are repeated until no change in C or U can be observed R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 18 / 76
Example • symmetric dataset with two clusters • hard c -means assigns crisp label to data point in middle • is this very intuitive? R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 19 / 76
Discussion • hard c -means tends to get stuck in local minimum • it is necessary to conduct several runs with different initializations [Duda and Hart, 1973] • sophisticated initialization methods can be used as well, e.g. , Latin hypercube sampling [McKay et al., 1979] • best result of many clusterings can be chosen based on J h • crisp memberships { 0 , 1 } prohibit ambiguous assignments • when clusters are badly delineated or overlapping, relaxing requirement u ij ∈ { 0 , 1 } needed R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Clustering 2009/12/13 20 / 76
Recommend
More recommend