 
              Plan Separation and convexity properties of hierarchical and non hierarchical clustering Patrice Bertrand 1 1 CEREMADE, Universit´ e Paris-Dauphine, Paris, France Joint work with Jean Diatta 2 2 LIM, Universit´ e de La R´ eunion, Saint-Denis, France P. Bertrand
Plan Plan 1 Background Ternary separation and convexity 2 Characterizations of clustering structures 3 Application to Cluster Analysis 4 P. Bertrand
Plan Plan 1 Background Ternary separation and convexity 2 Characterizations of clustering structures 3 Application to Cluster Analysis 4 P. Bertrand
Plan Plan 1 Background Ternary separation and convexity 2 Characterizations of clustering structures 3 Application to Cluster Analysis 4 P. Bertrand
Plan Plan 1 Background Ternary separation and convexity 2 Characterizations of clustering structures 3 Application to Cluster Analysis 4 P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis ◮ Multi-level clustering structures • Hierarchies Johnson (1967), Benz´ ecri (1973) • Weak Hierarchies Bandelt & Dress (1989, 1994), Diatta & Fichet (1994, 1998), Bertrand & Janowitz (2002) • Pyramids (or pseudo-hierarchies) Diday (1984, 1986), Fichet (1984, 1986) • Paired hierarchies Bertrand (2002, 2008), Bertrand & Brucker (2007) P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis Definitions A pair { A , B } ⊆ E (ground set) is said to be ◮ hierarchical : A ∩ B ∈ { A , B , ∅} If { A , B } is not hierarchical, then A and B cross each other B A We use the following terminology for F ⊆ 2 E : ◮ set-system : {∅} / ∈ F and E ∈ F ◮ total : for all x ∈ E , { x } ⊆ F ◮ closed : F is closed under non empty intersections: � ∀G ⊆ F , G ∈ F ∪ {∅} ◮ (strongly) hierarchical : each pair { X , Y } ⊆ F is hierarchical P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis Weak hierarchies A collection F ⊆ 2 E is said to be weakly hierarchical if ∀ X , Y , Z ∈ F , X ∩ Y ∩ Z ∈ { X ∩ Y , Y ∩ Z , X ∩ Z } nsc There are no A 1 , A 2 , A 3 ∈ F and a 1 , a 2 , a 3 ∈ E s.t. a i ∈ A j ⇐ ⇒ i � = j Forbidden configuration: A A A 3 2 1 a a a 3 1 2 P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis Paired hierarchies A collection F ⊆ 2 E is called paired hierarchical if each F -member crosses at most one F -member nsc ◮ ∀ X , Y , Z ∈ F , at least 2 of { X , Y } , { Y , Z } , { X , Z } are hierarchical ◮ ” X crosses Y ” defines an equivalence relation whose class sizes are at most 2 ◮ Forbidden configurations: (a) (b) (c) The term paired-hierarchy is used since { � G : G is a class } is a hierarchy P. Bertrand
✂✄ ✝✞ ☎✆ ✞✟ �✁ ✂✄ ✝✞ ☎✆ ✂✄ �✁ ☎✆ ✝✞ ✆✝ ☎✆ ✝✞ ☎✆ ✂✄ �✁ ✂✄ �✁ ✝✞ �✁ �✁ Background Separation and convexity Characterizations Application to Cluster Analysis Examples and counter-examples Paired-hierarchies a c a b d c b d ✂✄✂ ☎✄☎ c d a b a b c d Weak-hierarchies a c c b d a b d P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis Correspondences between dissimilarities and multi-level clustering structures → ( F , f ) ( F ⊆ 2 E and f : F �→ R + being increasing) d (dissimilarity on E ) ← φ : ( F , f ) �→ φ ( F , f ) with φ ( F , f )( x , y ) = min { f ( A ) : a , b ∈ A , A ∈ F} Conversely, each dissimilarity d is associated with: ◮ D d ( x , y ) : closed ball of center x ∈ E and radius r = d ( x , y ) D d ( x , y ) = { z ∈ E : d ( z , x ) ≤ d ( x , y ) } ◮ B d ( x , y ) : 2-ball generated by x , y ∈ E , in the sense of d B d ( x , y ) D d ( x , y ) ∩ D d ( y , x ) = { z ∈ E : max { d ( z , x ) , d ( z , y ) } ≤ d ( x , y ) } = x y P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis Separation relation A ternary relation designates any subset of E 3 A (ternary) separation relation is a ternary relation of the form: ◮ Given F ⊆ 2 E , the ternary separation relation s ( F ) is defined by ( x , y , z ) ∈ s ( F ) if it exists a F -member which contains x and y but not z . In what follows, we will write simply xyz ∈ s ( F ) in place of ( x , y , z ) ∈ s ( F ) P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis Convexity A bstract convexity (van de Vel, cf. early 1950s). ◮ A collection C ⊆ 2 E is called a convexity on E if ∅ , E ∈ C and C is closed both under intersections and nested unions. ( E , C ) is called a convex structure or a convexity space . Convex set : any member of C ◮ ∀ A ⊆ E , conv C ( A ) = � A � C = � { C : A ⊆ C ∈ C } , is called the (convex) hull of A . ◮ Notations: � a , b � := � a , b � C ◮ Segment joining a and b : the 2-polytope conv ( { a , b } ) ◮ � , � C : ( a , b ) ∈ E 2 �→ � a , b � C ∈ 2 E is called the segment operator of the convexity C . P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis A rity The arity of C is ≤ n if for all C ∈ C and F ⊆ C with # F ≤ n , we have: � F � = conv ( F ) ⊆ C R ank A ⊆ E is called convexly independent if a / ∈ � A \ { a }� for all a ∈ A The rank of a convex structure ( E , C ) is defined as the maximum size of a convexly independent set. I nterval operator I : E × E �→ 2 E is called an interval operator on E if ∀ a , b ∈ E , a , b ∈ I ( a , b ) = I ( b , a ) . I ( a , b ) : interval between a and b ; ( E , I ) : interval space . Example: � , � C : ( a , b ) ∈ E 2 �→ � a , b � C of any convexity C P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis N otations G I := { C ⊆ E | ∀ x , y ∈ C , I ( x , y ) ⊆ C } is the convexity induced by I G � , � C := interval convexity induced by the segment operator � , � C � a , b � I segment between a and b in the sense of the convexity G I . P roperties (Calder (1971)) ◮ ∀ a , b ∈ E , I ( a , b ) ⊆ � a , b � I ◮ A convexity is induced by an interval operator iff its arity is ≤ 2. ◮ The hull of a set A in an interval space is given by ∞ � � A � = A k , k = 0 where A 0 = A and for all k ∈ N , A k + 1 = � { I ( a , a ′ ) | a , a ′ ∈ A k } . P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis Convexity induced by � , � C Lemma 1 Let C be a convexity on E . (i) � , � C and � , � � , � C coincide. (ii) We have: {� a , b � C | a , b ∈ E } ⊆ C ⊆ G � , � C , where the two inclusions may be strict. Remark It is easily checked that: xyz ∈ s ( C ) ⇐ ⇒ z �∈ � x , y � C . P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis Interval operators and Cluster Analysis ◮ B d and D d are two interval operators defined on E Lemma 2 For all dissimilarity d on E and all x , y ∈ E , there exist u , v ∈ E such that: � x , y � B d = � u , v � B d = B d ( u , v ) . P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis Separation, Interval operators and Weak Hierarchies ◮ Bandelt and Dress (1994): A set-system C is weakly hierarchical iff for all x 1 , x 2 , x 3 distinct in E , s ( C ) does not contains both x 1 x 2 x 3 , x 2 x 3 x 1 and x 3 x 1 x 2 ◮ Let I be an interval operator on E , and let ( W ) No x , y , z ∈ E exist s.t. x / ∈ I ( y , z ) , y / ∈ I ( x , z ) and z / ∈ I ( x , y ) . P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis Proposition 3 Let I be an interval operator and let ( i ) I satisfies ( W ) ( ii ) � , � I satisfies ( W ) ( iii ) G I is weakly hierarchical ( iv ) G I is of rank at most 2, i.e. if ∅ � A ⊆ E , then � A � I is of the form � a , b � I for some a , b ∈ A . Then ( i ) ⇒ ( ii ) ⇔ ( iii ) ⇔ ( iv ) P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis Corollary 4 If the interval operator I satisfies ( W ), then G I = {� a , b � I | a , b ∈ E } Definition 5 ( k -ball) Let A ⊆ E with # A = k > 2, and denote B d A = { x ∈ E | ∀ a ∈ A , d ( a , x ) ≤ diam d A } . Proposition 6 If ( C , f ) is an indexed closed weak-hierarchical set system s.t. f − 1 ( 0 ) = { X ∈ C | f ( X ) = 0 } is a partition of E , then B φ (( C , f )) = � A � C ∪{∅} , A for all nonempty subset A of E . P. Bertrand
Background Separation and convexity Characterizations Application to Cluster Analysis Notation 7 B ( C , f ) := B φ (( C , f )) Corollary 8 Let C be a set-system on E . The following are equivalent: (i) C is closed and weakly hierarchical (ii) C ∪ {∅} = G I for some interval operator I satisfying ( W ) (iii) C ∪ {∅} = G B ( C , f ) for some index f on C satisfying � f − 1 ( 0 ) = E (iv) C ∪ {∅} = G B ( C , f ) for all index f on C satisfying � f − 1 ( 0 ) = E Criterion to recognize whether a set system is weakly hierarchical: define f by f ( A ) := | A | − 1 P. Bertrand
Recommend
More recommend