Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical - PowerPoint PPT Presentation

Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical Clustering: theoretical Clustering: theoretical improvements and tests improvements and tests Sergiu Chelcea 1 , Patrice Bertrand , Patrice Bertrand 1,2 1,2 , Brigitte Trousse , Brigitte Trousse 1 Sergiu Chelcea AxIS , INRIA Sophia-Antipolis, France , INRIA Sophia-Antipolis, France 1. Action 1. Action AxIS 2. ENST 2. ENST Bretagne Bretagne, France , France LastName LastName.FirstName@inria.fr .FirstName@inria.fr GfKl 2003 13 March 2003

Outline Outline • • The classical case of AHC The classical case of AHC • • 2-3 Hierarchies 2-3 Hierarchies • Definitions Definitions • Properties Properties • • Algorithm of 2-3AHC of 2-3AHC Algorithm • • Analysis of of complexity complexity Analysis • • Application on simulated simulated data data Application on • Experimental Experimental Validation of Validation of Complexity Complexity • Ongoing Ongoing and and Future Future Work Work GfKl 2003 13 March 2003 1

Context Context Bertrand 2002 Bertrand 2002 Hierarchies Hierarchies Diday Diday 1984-86, 1984-86, Fichet Fichet 1987 1987 2-3 Hierarchies 2-3 Hierarchies Pyramids Pyramids Weak Hierarchies Weak Hierarchies Bandelt, Dress 1989 Bandelt , Dress 1989 Diatta Diatta, Fichet Fichet 1994 1994 GfKl 2003 13 March 2003 2

Hierarchies (1/3) (1/3) Hierarchies We recall recall some some definitions definitions related related to to the the hierarchical hierarchical We case that that w ill w ill be be extended extended to to the the 2-3 2-3 hierarchies hierarchies: : case • Hierarchy: : - • Hierarchy - each each cluster cluster is is nonempty nonempty - - E E and and the the singletons are clusters singletons are clusters - - each each pair of clusters (A,B) pair of clusters (A,B) is is hierarchical hierarchical: B A ∩ B ∈ { ∅ ,A,B ,A,B} A 2 A 1 Remark : : - Remark - admits admits at at most most n-1 non trivial clusters n-1 non trivial clusters Indexed hierarchy Indexed hierarchy: : - - each each cluster cluster is associated to is associated to a positive a positive real number real number f f , , ∀ ∈ ⊂ ⇒ < A , B S , A B f ( A ) f ( B ) w here w here GfKl 2003 13 March 2003 3

Agglomerative Hierarchical Classification Agglomerative Hierarchical Classification (2/3) (2/3) Vocabulary: : Vocabulary - set inclusion - set inclusion order order on on the the set of clusters: set of clusters: - predecessor - predecessor/successor successor - comparable clusters - comparable clusters - candidate clusters (unmarked) = maximal clusters - candidate clusters (unmarked) = maximal clusters δ × → ∞ : E E [ 0 , ) - data input: dissimilarity - data input: dissimilarity δ = δ > δ = ∀ ∈ ( a , b ) ( b , a ) ( a , a ) 0 , a , b E µ : clusters), µ - aggregation - aggregation index ( index (link link betw een betw een clusters), : - single linkage - single linkage - complete - complete linkage linkage - average - average linkage linkage µ (X,Y) Y) = µ f(X ∪ Y) = - usually - usually f(X (X,Y) GfKl 2003 13 March 2003 4

Algorithm AHC (3/3) AHC (3/3) Algorithm 1. Initialisation Initialisation: : iter ← 0; Clusters are the singletons of set E. 1. iter 0; Clusters are the singletons of set E. f ← 0; f 0; 2. iter ← iter 2. iter iter + 1; + 1; µ - the tw o nearest X and Y w hich are - in the sense of µ Merge Merge X and Y w hich are - in the sense of - the tw o nearest clusters; compute f(X ∪ Y) clusters; compute f(X 3. Reduction Reduction: : Eliminate the successors found on the same 3. Eliminate the successors found on the same level level f w ith their predecessor, if there are any w ith their predecessor, if there are any µ Update µ 4. Update 4. , predecessor predecessor links, links, successor successor links links 5. Stopping Stopping rule rule: : Repeat step 2-4, until the set E becomes a 5. Repeat step 2-4, until the set E becomes a cluster cluster GfKl 2003 13 March 2003 5

2-3 Hierarchies Hierarchies: : Definitions Definitions 2-3 Proper intersection intersection: : Proper • • B, if A ∩ B ∉ { ∅ ,A,B - A - A properly properly intersects intersects B, if A ,A,B} A B Concept: - Concept: - in a 2-3 in a 2-3 hierarchy hierarchy, for , for any three any three clusters clusters at least tw o at least tw o pairs of them pairs of them are are hierarchical hierarchical • 2-3 Hierarchy Hierarchy [Bertrand 2002]: [Bertrand 2002]: • 2-3 - each - each cluster cluster is is nonempty nonempty - - E E and singletons are clusters and singletons are clusters - the - the proper proper intersection of intersection of tw o tw o clusters clusters is is also also a cluster a cluster - each - each cluster cluster properly properly intersects intersects no more no more than than one one other other cluster cluster GfKl 2003 13 March 2003 6

2-3 Hierarchies Hierarchies: : Properties Properties 2-3 [Bertrand 2002] [Bertrand 2002] • • The The number number of of elements elements of a 2-3 of a 2-3 hierarchy hierarchy that that are are   3 not reduced not reduced to to singletons, singletons, is is at at most most − ) ( n 1     2 • Each 2-3 2-3 hierarchical hierarchical set set system system on E on E is is a a • Each collection of intervals intervals of of some some linear linear order order collection of defined on E. on E. defined 2-3 Hierarchy 2-3 Hierarchy Pyramid Pyramid GfKl 2003 13 March 2003 7

Algorithm of 2-3AHC of 2-3AHC Algorithm ← 0; Clusters are the singletons of set E. 1. Initialisation Initialisation: : 1. iter iter 0; Clusters are the singletons of set E. f ← 0; f 0; ← iter 2. iter 2. iter iter + 1; + 1; µ - the tw o X and Y w hich are - in the sense of µ Merge X and Y w hich are - in the sense of Merge - the tw o nearest non-comparable nearest non-comparable clusters, such that at least clusters, such that at least one of them is maximal; compute f(X ∪ Y) one of them is maximal; compute f(X X ∪ Y and the other predecessor of X or Y, if it 3. Merge Merge X 3. and the other predecessor of X or Y, if it exists. exists. compute f(X ∪ Y) compute f(X 4. Reduction Reduction: : Eliminate the successors found on the same 4. Eliminate the successors found on the same level f level f w ith their predecessor, if there are any w ith their predecessor, if there are any µ µ 5. 5. Update Update , , predecessor predecessor links, links, successor successor links links 6. Stopping Stopping rule rule: : Repeat step 2-5, until the set E becomes a 6. Repeat step 2-5, until the set E becomes a cluster cluster GfKl 2003 13 March 2003 8

Algorithm of 2-3AHC of 2-3AHC Algorithm • Generalizes the the AHC: AHC: • Generalizes - a cluster - a cluster can can be be merged merged w ith w ith tw o tw o different different clusters clusters • • Double single linkage Double single linkage [ [Jullien Jullien, Bertrand 2002]: , Bertrand 2002]: ∪ = µ µ ∪ f ( X Y ) Min { ( X , Y ), ( X Y , Z ) : Z candidate cluster } • Complexity: : O(n • Complexity O(n 2 log log n) n) GfKl 2003 13 March 2003 9

Analysis of Complexity (1/3) Analysis of Complexity (1/3) We use an ordered dissimilarity matrix on three levels: We use an ordered dissimilarity matrix on three levels: - dissimilarity values - dissimilarity values - cardinality of the tw o clusters - cardinality of the tw o clusters - lexicographical order - lexicographical order Step 1. Step 1. Initialisation Initialisation: : Compute and order the dissimilarity Compute and order the dissimilarity matrix, O(n matrix, O(n 2 log log n) n) Step 2. Merge Merge X and Y … : Retrieve (X,Y) from the data structure, Step 2. X and Y … : Retrieve (X,Y) from the data structure, and create X ∪ Y, O(1) and create X Y, O(1) X ∪ Y and … : Intermediate merging w ith O(n) Step 3. Merge Merge X Step 3. and … : Intermediate merging w ith O(n) complexity complexity GfKl 2003 13 March 2003 10

Analysis of of Complexity Complexity (2/3) (2/3) Analysis Step 4. Reduction Reduction: : We have five possible cases of reduction Step 4. We have five possible cases of reduction w hen merging a cluster: w hen merging a cluster: α . α β 2 X’ β 2 Y’ β 2 Z β 1 - eliminate the successors found on the same level - eliminate the successors found on the same level w ith their predecessor w ith their predecessor - complexity O(n) - complexity O(n) GfKl 2003 13 March 2003 11

Analysis of of Complexity Complexity (3/3) (3/3) Analysis µ µ Step 5. Update Update : Step 5. - compute new dissimilarities and store them in - compute new dissimilarities and store them in the matrix, O(n the matrix, O(n log log n) n) - eliminate dissimilarities containing non candidates - eliminate dissimilarities containing non candidates clusters, O(n clusters, O(n log log n) n) Total complexity of the algorithm : Total complexity of the algorithm n) + n × O(n n) → O(n O(n 2 log O(n 2 log O(n log n) + n O(n log log n) log n) n) step 1. step 1. steps 2. - 5. steps 2. - 5. GfKl 2003 13 March 2003 12

Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical - PowerPoint PPT Presentation

Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical Clustering: theoretical Clustering: theoretical improvements and tests improvements and tests Sergiu Chelcea 1 , Patrice Bertrand , Patrice Bertrand 1,2 1,2 , Brigitte Trousse ,

Hierarchical)&)Spectral)clustering) Lecture)13) David&Sontag&

Hierarchical Clustering Lecture 15 David Sontag New York

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Cluster Analysis Applied Multivariate Statistics Spring 2012 Overview Hierarchical

Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7.10, EA], [25.5, KPM], [Fred &

HIERARCHICAL DETERMINISTIC WALLETS JOHN NEWBERY @jfnewbery github.com/jnewbery HIERARCHICAL

Using Hierarchical Modeling to Assist Using Hierarchical Modeling to Assist Effects Based

The Brain as a Hierarchical The Brain as a Hierarchical Organization Organization I sabelle

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Hierarchical Pointer Analysis for Distributed Programs Distributed Programs Amir Kamil and

Security Technologies and Hierarchical Trust Security Technologies and Hierarchical Trust Today

Hierarchical clustering David M. Blei COS424 Princeton University February 28, 2008 D. Blei

Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Autumn conference of Early Education 10 November 2017 Child poverty and the early years - part

Population Health Management: Promise, Progress and Pitfalls Paula Lantz, PhD Associate Dean for

Maryland Planning Grant for Autism and other Developmental Disabilities Dr. Debbie Badawi and

Sovereign and Sovereign-guaranteed Eurobond debt operation Investor Presentation Kyiv - 02

Your High School Journey A Guide for What You Need to Know to Make a Successful Transition to

A True Mining District Forward Looking Statement This presentation of Guyana Goldfields Inc.

www.whscounselingcenter.com Twitter: @WHSCollegeReady *This presentation will be available on our

Twitter: @WHSCollegeReady Timeline for Scheduling Jan. 28-Feb.8 January 24 Late April

Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical - PowerPoint PPT Presentation

Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical Clustering: theoretical Clustering: theoretical improvements and tests improvements and tests Sergiu Chelcea 1 , Patrice Bertrand , Patrice Bertrand 1,2 1,2 , Brigitte Trousse ,

Hierarchical)&amp;)Spectral)clustering) Lecture)13) David&amp;Sontag&amp;

Hierarchical Clustering Lecture 15 David Sontag New York

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Cluster Analysis Applied Multivariate Statistics Spring 2012 Overview Hierarchical

Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7.10, EA], [25.5, KPM], [Fred &amp;

HIERARCHICAL DETERMINISTIC WALLETS JOHN NEWBERY @jfnewbery github.com/jnewbery HIERARCHICAL

Using Hierarchical Modeling to Assist Using Hierarchical Modeling to Assist Effects Based

The Brain as a Hierarchical The Brain as a Hierarchical Organization Organization I sabelle

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Hierarchical Pointer Analysis for Distributed Programs Distributed Programs Amir Kamil and

Security Technologies and Hierarchical Trust Security Technologies and Hierarchical Trust Today

Hierarchical clustering David M. Blei COS424 Princeton University February 28, 2008 D. Blei

Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Autumn conference of Early Education 10 November 2017 Child poverty and the early years - part

Population Health Management: Promise, Progress and Pitfalls Paula Lantz, PhD Associate Dean for

Maryland Planning Grant for Autism and other Developmental Disabilities Dr. Debbie Badawi and

Sovereign and Sovereign-guaranteed Eurobond debt operation Investor Presentation Kyiv - 02

Your High School Journey A Guide for What You Need to Know to Make a Successful Transition to

A True Mining District Forward Looking Statement This presentation of Guyana Goldfields Inc.

www.whscounselingcenter.com Twitter: @WHSCollegeReady *This presentation will be available on our

Twitter: @WHSCollegeReady Timeline for Scheduling Jan. 28-Feb.8 January 24 Late April

Hierarchical)&)Spectral)clustering) Lecture)13) David&Sontag&

Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7.10, EA], [25.5, KPM], [Fred &