diameter, radius, discrete radius D : M M R distance function, S M - PowerPoint PPT Presentation

diameter, radius, discrete radius D : M × M → R distance function, S ⊂ M , | S | < ∞ ▶ diam D ( S ) := max x , y ∈ S D ( x , y ) (diameter of S ) ▶ rad D ( S ) := min m ∈ M max x ∈ S D ( x , m ) (radius of S ) ▶ drad D ( S ) := min m ∈ S max x ∈ S D ( x , m ) (discrete radius of S ) P ⊂ M , | P | < ∞ , C = { C 1 , . . . , C k } partition of P ▶ cost D diam ( C ) := max 1 ≤ i ≤ k diam D ( C i ) (diameter cost) rad ( C ) := max 1 ≤ i ≤ k rad D ( C i ) (radius cost) ▶ cost D ▶ cost D drad ( C ) := max 1 ≤ i ≤ k drad D ( C i ) (discrete radius cost)

diameter, radius, discrete radius Problem 6.1 (diameter k -clustering) Given a set P , | P | < ∞ , k ∈ N , find a partition C of P into k clusters C 1 , . . . , C k that minimizes cost D diam ( C ) . Problem 6.2 (radius k -clustering) Given a set P , | P | < ∞ , k ∈ N , find a partition C of P into k clusters C 1 , . . . , C k that minimizes cost D rad ( C ) . Problem 6.3 (discrete radius k -clustering) Given a set P , | P | < ∞ , k ∈ N , find a partition C of P into k clusters C 1 , . . . , C k that minimizes cost D drad ( C ) .

Diameter clustering

Agglomerative clustering - setup and idea D : M × M → R distance function, P ⊂ M , | P | = n , P = { p 1 , . . . , p n } Basic idea of agglomerative clustering ▶ start with n clusters C i , 1 ≤ i ≤ n , C i := { p i } ▶ in each step replace two clusters C i , C j that are ”closest” by their union C i ∪ C j ▶ until single cluster is left. Observation Computes k -clustering for k = n , . . . , 1 .

Complete linkage Definition 6.4 For C 1 , C 2 ⊂ M D CL ( C 1 , C 2 ) := x ∈ C 1 , y ∈ C 2 D ( x , y ) max is called the complete linkage cost of C 1 , C 2 . D ( C C L C 1 , ) 2

b b b b b Agglomerative clustering with complete linkage AgglomerativeCompleteLinkage ( P ) C n := {{ p i }| p i ∈ P } ; for i = n − 1 , . . . , 1 do find distinct cluster A , B ∈ C i +1 minimizing D CL ( A , B ); C i := ( C i +1 \ { A , B } ) ∪ { A ∪ B } ; end return C 1 , . . . , C n (or single C k ) B A E C D

Agglomerative clustering with complete linkage AgglomerativeCompleteLinkage ( P ) C n := {{ p i }| p i ∈ P } ; for i = n − 1 , . . . , 1 do find distinct cluster A , B ∈ C i +1 minimizing D CL ( A , B ); C i := ( C i +1 \ { A , B } ) ∪ { A ∪ B } ; end return C 1 , . . . , C n (or single C k ) Theorem 6.5 Algorithm AgglomerativeCompleteLinkage requires time O ( n 2 log n ) and space O ( n 2 ) .

Approximation guarantees ▶ diam D ( S ) := max x , y ∈ S D ( x , y ) (diameter of S ) ▶ cost D diam ( C ) := max 1 ≤ i ≤ k diam D ( C i ) (diameter cost) ▶ opt diam ( P ) := min |C| = k cost D diam ( C ) k Theorem 6.6 Let D be a distance metric on M ⊆ R d . Then for all sets P and all k ≤ | P | , Algorithm AgglomerativeCompleteLinkage computes a k-clustering C k with ( ) cost D opt diam diam ( C k ) ≤ O ( P ) , k where the constant hidden in the O-notation is double exponential in d.

Approximation guarantees Theorem 6.7 There is a point set P ⊂ R 2 such that for the metric D l ∞ algorithm AgglomerativeCompleteLinkage computes a clustering C k with cost D diam ( C k ) = 3 · opt diam ( P ) . k E A F D B H C G

Approximation garantees Theorem 6.8 There is a point set P ⊂ R d , d = k + log k such that for the metric D l 1 algorithm AgglomerativeCompleteLinkage computes a clustering C k with diam ( C k ) ≥ 1 D l 1 2 log k · opt diam ( P ) . cost k Corollary 6.9 For every 1 ≤ p < ∞ , there is a point set P ⊂ R d , d = k + log k such that for the metric D l p algorithm AgglomerativeCompleteLinkage computes a clustering C k with √ 1 D lp p 2 log k · opt diam cost diam ( C k ) ≥ ( P ) . k

Hardness of diameter clustering Theorem 6.10 For the metric D l 2 the diameter k-clustering problem is NP -hard. Moreover, assuming P ̸ = NP , there is no polynomial time approximation for the diameter k-clustering with approximation factor ≤ 1 . 96 .

Hardness of diameter clustering ▶ ∆ ∈ R n × n ≥ 0 , ∆ xy := ( x , y )-entry in ∆ , 1 ≤ x , y ≤ n ▶ C = { C 1 , . . . , C k } partition of { 1 , . . . , n } ▶ cost ∆ diam := max 1 ≤ i ≤ k max x , y ∈ C i ∆ xy Problem 6.11 (matrix diameter k -clustering) Given a matrix ∆ ∈ R n × n ≥ 0 , k ∈ N , find a partition C of { 1 , . . . , n } into k clusters C 1 , . . . , C k that minimizes cost ∆ diam ( C ) . Theorem 6.12 The matrix diameter k-clustering problem is NP -hard. Moreover, assuming P ̸ = NP , there is no polynomial time approximation for the diameter k-clustering with approximation factor α ≥ 1 arbitrary.

Maximum distance k -clustering Problem 6.13 (maximum distance k -clustering) Given distance measure D : M × M → R , k ∈ N , and P ⊂ M, find a partition C = { C 1 , . . . , C k } of P into k clusters that maximizes x ∈ C i , y ∈ C j , i ̸ = j D ( x , y ) , min i.e. a partition that maximizes the minimum distance between points in different clusters. Definition 6.14 For C 1 , C 2 ⊂ M D SL ( C 1 , C 2 ) := x ∈ C 1 , y ∈ C 2 D ( x , y ) min is called the single linkage cost of C 1 , C 2 .

Agglomerative clustering with single linkage AgglomerativeSingleLinkage ( P ) C n := {{ p i }| p i ∈ P } ; for i = n − 1 , . . . , 1 do find distinct cluster A , B ∈ C i +1 minimizing D SL ( A , B ); C i := ( C i +1 \ { A , B } ) ∪ { A ∪ B } ; end return C 1 , . . . , C n (or single C k ) Theorem 6.15 Algorithm AgglomerativeSingleLinkage optimally solves the maximum distance k-clustering problem.

diam , rad , and drad ▶ drad D ( S ) := min m ∈ S max x ∈ S D ( x , m ) (discrete radius of S ) ▶ cost D drad ( C ) := max 1 ≤ i ≤ k drad D ( C i ) (discrete radius cost) ▶ find a partition C of P into k clusters C 1 , . . . , C k that minimizes cost D drad ( C ) or cost D rad ( C ). Theorem 6.16 Let D : M × M → R be a metric, P ⊂ M and C = { C 1 , . . . , C k } a partition of P. Then 1. cost drad ( C ) ≤ cost diam ( C ) ≤ 2 · cost drad ( C ) 1 2. 2 · cost drad ( C ) ≤ cost rad ( C ) ≤ cost drad ( C )

diam , rad , and drad Corollary 6.17 Let D : M × M → R be a metric, k ∈ N , and P ⊂ M. Then 1. opt drad ( P ) ≤ opt diam ( P ) ≤ 2 · opt drad ( P ) k k k 1 2 · opt drad ( P ) ≤ opt rad k ( P ) ≤ opt drad 2. ( P ) k k Corollary 6.18 Assume there is a polynomial time c-approximation algorithm for the discrete radius k-clustering problem. Then there is a polynomial time 2 c-approximation algorithm for the diameter k-clustering problem.

Clustering and Gonzales’ algorithm GonzalesAlgorithm ( P , k ) C := { p } for p ∈ P arbitrary; for i = 1 , . . . , k do q := argmax y ∈ P D ( y , C ); C := C ∪ { q } ; end compute partition C = { C 1 , . . . , C k } corresponding to C ; return C and C Theorem 6.19 Algorithm GonzalesAlgorithm is a 2 -approximation algorithm for the diameter, radius, and discrete radius k-clustering problem.

Agglomerative clustering and discrete radius clustering ▶ drad D ( S ) := min m ∈ S max x ∈ S D ( x , m ) (discrete radius of S ) ▶ cost D drad ( C ) := max 1 ≤ i ≤ k drad D ( C i ) (discrete radius cost) ▶ find a partition C of P into k clusters C 1 , . . . , C k that minimizes cost D drad ( C ). Discrete radius measure D drad ( C 1 , C 2 ) = drad( C 1 ∪ C 2 )

Agglomerative clustering with dradius cost AgglomerativeDiscreteRadius ( P ) C n := {{ p i }| p i ∈ P } ; for i = n − 1 , . . . , 1 do find distinct clusters A , B ∈ C i +1 minimizing D drad ( A , B ); C i := ( C i +1 \ { A , B } ) ∪ { A ∪ B } ; end return C 1 , . . . , C n (or single C k ) Theorem 6.20 Let D be a distance metric on M ⊆ R d . Then for all sets P ⊂ M and all k ≤ | P | , Algorithm AgglomerativeDiscreteRadius computes a k-clustering C k with cost drad ( C k ) < O ( d ) · opt k . k

Hierarchical clusterings and dendrograms Hierarchical clustering Given distance measure D : M × M → R , k ∈ N , and P ⊂ M , | P | = n , a sequence of clusterings C n , . . . , C 1 with |C k | = k is called hierarchical clustering of P if for all A ∈ C k 1. A ∈ C k +1 or 2. ∃ B , C ∈ C k +1 : A = B ∪ C a nd C k = C k +1 \ { B , C } ∪ { A } . Dendrograms A dendrogram on n nodes is a rooted binary tree T = ( V , E ) with an index function χ : V \ { leaves of T } → { 1 , . . . , n } such that ▶ ∀ v ̸ = w : χ ( v ) ̸ = χ ( w ) ▶ χ (root) = n ▶ ∀ u , v : if v parent of u , then χ ( v ) > χ ( u ).

From hierarchical clusterings to dendrograms C n , . . . , C 1 hierarchical clustering of P . Construction of dendrogram ▶ create leaf for each point p ∈ P ▶ interior nodes correspond to union of clusters ▶ if k -th cluster is obtained by union of clusters B , C , create new node with index k and with children B , C .

b b b b b Dendrograms AgglomerativeCompleteLinkage ▶ Start with one cluster for each input object. ▶ Iteratively merge the two closest clusters. Complete linkage measure D CL ( C 1 , C 2 ) = x ∈ C 1 , y ∈ C 2 D ( x , y ) max A B C D E B A E C D

diameter, radius, discrete radius D : M M R distance function, S M - PowerPoint PPT Presentation

diameter, radius, discrete radius D : M M R distance function, S M , | S | < diam D ( S ) := max x , y S D ( x , y ) (diameter of S ) rad D ( S ) := min m M max x S D ( x , m ) (radius of S ) drad D ( S ) :=

Mitchell Anderson Fortunate Son , 2017 Acrylic on canvas 200cm diameter 78 3/4in diameter

The diameter of permutation groups permutation groups H. A. Helfgott February 2017 The

The diameter of permutation groups kos Seress May 2012 Cayley graphs The diameter of

RadiusPlus Ltd Minimuss Large Diameter Branch Saddle Connections For PE mains www.radius

K. Ishida RIKEN Proton Radius Puzzle Zemach radius and hyperfine splitting p Plan of our

Bounded Radius Routing Perform bounded PRIM algorithm Under = 0, = 0.5, and =

Device for In-Situ Coating of Long, Small Diameter Tubes Diameter Tubes Project Summary Award

The diameter of permutation groups Proof ideas H. A. Helfgott and . Seress July 2013 Cayley

Diameter Group Signaling Thursday, March 6 th , 2014 draft-ietf-diameter-group-signaling-03 Mark

Mesh Models (Chapter 8) 1. Overview of Mesh and Related models. a. Diameter: The linear

diameter. Since the diameter is twice the radius, circumference can be calculated by the formula

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

October October October October 27 27 27-28, 28, 28, 28, 2014 2014 2014 2014 HHS,

Stratification and intergenerational Mobility in Africa - Examining Linkages with Pre-colonial

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

ELF linking: what it means and why it matters Stephen Kell stephen.kell@cl.cam.ac.uk joint work

Clustering methods R.W. Oldford Interactive data visualization An important advantage of data

Lab 8: 21 May 2012 Exercises on Clustering 1. Use the k-means algorithm and Euclidean distance to

Compiler construction Martin Steffen March 22, 2017 Contents 1 Abstract 1 1.1 Run-time

Implementing Procedure Calls February 1822, 2013 1 / 39 Outline Intro to procedure calls

Sambuz

Useful Links

Newsletter

Mail Us