Evidential Clustering: a Review of Some New Developments Thierry Denœux Université de Technologie de Compiègne HEUDIASYC (UMR CNRS 6599) https://www.hds.utc.fr/˜tdenoeux 4th International Conference on Belief Functions Prague, CZ September 21, 2016 Thierry Denœux Evidential clustering Belief 2016, Prague 1 / 80
Clustering n objects described by Attribute vectors x 1 , . . . , x n (attribute data) or Dissimilarities (proximity data) Goals: Discover groups in the data 1 Assess the uncertainty in group 2 membership Thierry Denœux Evidential clustering Belief 2016, Prague 2 / 80
Hard and soft clustering concepts Hard clustering: no representation of uncertainty. Each object is assigned to one and only one group. Group membership is represented by binary variables u ik such that u ik = 1 if object i belongs to group k and u ik = 0 otherwise. Fuzzy clustering: each object has a degree of membership u ik ∈ [ 0 , 1 ] to each group, with � c k = 1 u ik = 1 . The u ik ’s can be interpreted as probabilities. Fuzzy clustering with noise cluster: the above equality is replaced by � c k = 1 u ik ≤ 1 . The number 1 − � c k = 1 u ik is interpreted as a degree of membership (or probability of belonging to) to a noise cluster. Thierry Denœux Evidential clustering Belief 2016, Prague 3 / 80
Hard and soft clustering concepts Possibilistic clustering: the u ik are free to take any value in [ 0 , 1 ] c . Each number u ik is interpreted as a degree of possibility that object i belongs to group k . Rough clustering: each cluster ω k is characterized by a lower approximation ω k and an upper approximation ω k , with ω k ⊆ ω k ; the membership of object i to cluster k is described by a pair ( u ik , u ik ) ∈ { 0 , 1 } 2 , with u ik ≤ u ik , � c k = 1 u ik ≤ 1 and � c k = 1 u ik ≥ 1. Thierry Denœux Evidential clustering Belief 2016, Prague 4 / 80
Clustering and belief functions clustering structure uncertainty framework fuzzy partition probability theory possibilistic partition possibility theory rough partition (rough) sets ? belief functions As belief functions extend probabilities, possibilities and sets, could the theory of belief functions provide a more general and flexible framework for cluster analysis? Objectives: Unify the various approaches to clustering Achieve a richer and more accurate representation of uncertainty New clustering algorithms and new tools to compare and combine clustering results. Thierry Denœux Evidential clustering Belief 2016, Prague 5 / 80
Outline Evidential clustering 1 Credal partition Summarization of a credal partition Relational representation of a credal partition Evidential clustering algorithms 2 Evidential c -means EVCLUS E k -NNclus Comparing and combining the results of soft clustering algorithms 3 The credal Rand index Combining clustering structures Thierry Denœux Evidential clustering Belief 2016, Prague 6 / 80
Evidential clustering Outline Evidential clustering 1 Credal partition Summarization of a credal partition Relational representation of a credal partition Evidential clustering algorithms 2 Evidential c -means EVCLUS E k -NNclus Comparing and combining the results of soft clustering algorithms 3 The credal Rand index Combining clustering structures Thierry Denœux Evidential clustering Belief 2016, Prague 7 / 80
Evidential clustering Credal partition Outline Evidential clustering 1 Credal partition Summarization of a credal partition Relational representation of a credal partition Evidential clustering algorithms 2 Evidential c -means EVCLUS E k -NNclus Comparing and combining the results of soft clustering algorithms 3 The credal Rand index Combining clustering structures Thierry Denœux Evidential clustering Belief 2016, Prague 8 / 80
Evidential clustering Credal partition Evidential clustering Let O = { o 1 , . . . , o n } be a set of n objects and Ω = { ω 1 , . . . , ω c } be a set of c groups (clusters). Each object o i belongs to at most one group. Evidence about the group membership of object o i is represented by a mass function m i on Ω : for any nonempty set of clusters A ⊆ Ω , m i ( A ) is the probability of knowing only that o i belong to one of the clusters in A . m i ( ∅ ) is the probability of knowing that o i does not belong to any of the c groups. The n -tuple M = ( m 1 , . . . , m n ) is called a credal partition. Thierry Denœux Evidential clustering Belief 2016, Prague 9 / 80
Evidential clustering Credal partition Example Butterfly data 12 10 Credal partition 8 6 ∅ { ω 1 } { ω 2 } { ω 1 , ω 2 } m 3 0 1 0 0 x 2 4 m 5 0 0.5 0 0.5 2 2 10 m 6 0 0 0 1 m 12 0.9 0 0.1 0 1 3 5 6 7 9 11 0 4 8 − 2 − 5 0 5 10 x 1 Thierry Denœux Evidential clustering Belief 2016, Prague 10 / 80
Evidential clustering Credal partition Relationship with other clustering structures More%general% m i %general% Credal%par''on% m i %unormalized%% Bayesian% Fuzzy%par''on% with%a%noise%cluster% Possibilis'c%par''on% Rough%par''on% Fuzzy%par''on% m i %consonant% m i %logical% m i %Bayesian% Hard%par''on% m i %certain% Less%general% Thierry Denœux Evidential clustering Belief 2016, Prague 11 / 80
Evidential clustering Credal partition Rough clustering as a special case Assume that each m i is logical, i.e., m i ( A i ) = 1 for some A i ⊆ Ω , A i � = ∅ . We can then define the lower and upper approximations of cluster ω k as ω k = { o i ∈ O | A i = { ω k }} , ω k = { o i ∈ O | ω k ∈ A i } . The membership values to the lower and upper approximations of cluster ω k are u ik = Bel i ( { ω k } ) and u ik = Pl i ( { ω k } ) . m({ ω 1 })=1( m({ ω 1 , ω 2 })=1( m({ ω 2 })=1( Lower( Upper( approxima4ons( approxima4ons( L ( ω 1 L ( U ( ω 2 U ( ω 2 ω 1 Thierry Denœux Evidential clustering Belief 2016, Prague 12 / 80
Evidential clustering Summarization of a credal partition Outline Evidential clustering 1 Credal partition Summarization of a credal partition Relational representation of a credal partition Evidential clustering algorithms 2 Evidential c -means EVCLUS E k -NNclus Comparing and combining the results of soft clustering algorithms 3 The credal Rand index Combining clustering structures Thierry Denœux Evidential clustering Belief 2016, Prague 13 / 80
Evidential clustering Summarization of a credal partition Summarization of a credal partition More complex unnormalized Credal par''on pignis'c/plausibility transforma'on interval dominance or maximum mass contour Fuzzy par''on func'on with a noise cluster normaliza'on Possibilis'c par''on Rough par''on Fuzzy par''on maximum plausibility maximum probability Hard par''on Less complex Thierry Denœux Evidential clustering Belief 2016, Prague 14 / 80
Evidential clustering Summarization of a credal partition From evidential to rough clustering For each i , let A i ⊆ Ω be the set of non dominated clusters A i = { ω ∈ Ω |∀ ω ′ ∈ Ω , Bel ∗ i ( { ω ′ } ) ≤ Pl ∗ i ( { ω } ) } , where Bel ∗ i and Pl ∗ i are the normalized belief and plausibility functions. Lower approximation: � if A i = { ω k } 1 u ik = 0 otherwise. Upper approximation: � if ω k ∈ A i 1 u ik = 0 otherwise. The outliers can be identified separately as the objects for which m i ( ∅ ) ≥ m i ( A ) for all A � = ∅ . Thierry Denœux Evidential clustering Belief 2016, Prague 15 / 80
Evidential clustering Relational representation of a credal partition Outline Evidential clustering 1 Credal partition Summarization of a credal partition Relational representation of a credal partition Evidential clustering algorithms 2 Evidential c -means EVCLUS E k -NNclus Comparing and combining the results of soft clustering algorithms 3 The credal Rand index Combining clustering structures Thierry Denœux Evidential clustering Belief 2016, Prague 16 / 80
Evidential clustering Relational representation of a credal partition Relational representation of a hard partition A hard partition can be represented equivalently by the n × c membership matrix U = ( u ik ) or an n × n relation matrix R = ( r ij ) representing the equivalence relation � 1 if o i and o j belong to the same group r ij = 0 otherwise. The relational representation R is invariant under renumbering of the clusters, and is thus more suitable to compare or combine several partitions. What is the counterpart of matrix R in the case of a credal partition? Thierry Denœux Evidential clustering Belief 2016, Prague 17 / 80
Evidential clustering Relational representation of a credal partition Pairwise representation Let M = ( m 1 , . . . , m n ) be a credal partition. For a pair of objects { o i , o j } , let Q ij be the question “Do o i and o j belong to the same group?” defined on the frame Θ = { S , ¬ S } . Θ is a coarsening of Ω 2 . Given m i and m j on Ω , a mass function m ij on Ω ω 1 ω 2 ω 3 ω 4 Ω Θ can be computed as follows: S ω 1 Extend m i and m j to Ω 2 ; 1 Combine the extensions of m i and m j by 2 ω 2 the unnormalized Dempster’s rule; ω 3 Compute the restriction of the combined 3 ω 4 mass function to Θ . Thierry Denœux Evidential clustering Belief 2016, Prague 18 / 80
Recommend
More recommend