Probabilistic clustering of high dimensional norms Assaf Naor Princeton University SODA’17
Partitions of metric spaces P Let be a metric space and a partition ( X; d X ) of X .
x 2 X; Given a point P ( x ) P is the unique cluster in which contains x .
x 2 X; Given a point x P ( x ) P is the unique cluster in which contains x .
x 2 X; Given a point P ( x ) P ( x ) P is the unique cluster in which contains x .
x 2 X; Given a point x P ( x ) P is the unique cluster in which contains x .
x 2 X; Given a point P ( x ) P ( x ) P is the unique cluster in which contains x .
Given , the partition is bounded if ¢ > 0 P ¢ ¡ all the clusters of have diameter at most P ¢ : ¡ ¢ u;v 2 P ( x ) d X ( u;v ) 6 ¢ : 8 x 2 X; diam X P ( x ) := max
Given , the partition is bounded if ¢ > 0 P ¢ ¡ all the clusters of have diameter at most P ¢ : 6 ¢ ¢ 6 6 ¢ 6 ¢ ¢ 6 6 ¢ ¡ ¢ u;v 2 P ( x ) d X ( u;v ) 6 ¢ : 8 x 2 X; diam X P ( x ) := max
Random partitions - Key goal in several areas of computer science ¢ ¡ and mathematics: use a bounded partition to “simplify” the metric space.
Random partitions - Key goal in several areas of computer science ¢ ¡ and mathematics: use a bounded partition to “simplify” the metric space. - The partition should “mimic” the coarse ¢ geometric structure (at distance scale ) in some meaningful way.
Random partitions - Key goal in several areas of computer science ¢ ¡ and mathematics: use a bounded partition to “simplify” the metric space. - The partition should “mimic” the coarse ¢ geometric structure (at distance scale ) in some meaningful way. - Regions near boundaries should be “thin.”
Random partitions - Key goal in several areas of computer science ¢ ¡ and mathematics: use a bounded partition to “simplify” the metric space. - The partition should “mimic” the coarse ¢ geometric structure (at distance scale ) in some meaningful way. - Regions near boundaries should be “thin.” - Quite paradoxical, but randomness helps here…
Separating random partitions Definition (Bartal, 1996): Suppose that ( X; d X ) ¾ ; ¢ > 0 : is a metric space and ¢ ¡ P A distribution over bounded random ¾ ¡ partitions of X is said to be separating if £ ¤ 6 ¾ P 8 x;y 2 X; P ( x ) 6 = P ( y ) ¢ d X ( x;y ) : (Implicit in several early works, variety of applications: Leighton- Rao [1988], Auerbuch-Peleg [1990], Linial-Saks [1991], Alon- Karp-Peleg-West [1991], Klein-Plotkin-Rao [1993], Rao [1999].)
Modulus of separated decomposability ¾ > 0 Denote by the minimum such SEP ( X ) ¾ ¡ that for every there is a separating ¢ > 0 ¢ ¡ distribution over bounded random ( X;d X ) : partitions of Note: we are ignoring here technical measurability issues that are important for mathematical applications in the infinite setting. For TCS purposes, it suffices to deal with random partitions of finite subsets of X .
Theorem (Bartal, 1996): If then j X j = n SEP ( X ) . log n: SEP ( X ) Goal of present work: to study for finite dimensional normed spaces X (and subsets thereof). Originated in Peleg-Reshef [1998], followed by important work of Charikar-Chekuri-Goel-Guha- Plotkin [1998].
Sharp a priori bounds Theorem: Suppose that X is an n -dimensional normed space. Then p n . SEP ( X ) . n: The upper bound follows from [CCGGP98]. The lower bound hasn’t been noticed before: it follows from a theorem of Bourgain-Szarek (1988) that is a consequence of the Bourgain- Tzafriri restricted invertibility principle (1987).
p n . SEP ( X ) . n: Both bounds are asymptotically sharp, as shown in [CCGGP98]. In fact, it is proved there that 2 ) ³ p n SEP ( ` n SEP ( ` n 1 ) ³ n: and x = ( x 1 ;:::;x n ) 2 R n ; p 2 [1 ; 1 ) For and µ ¶ 1 n X p j x j j p k x k ` n p := j =1 k x k ` n j 2f 1 ;:::;n g j x j j : 1 := max
In [CCGGP98], Charikar-Chekuri-Goel-Guha-Plotkin asserted that ( 1 if 1 6 p 6 2 ; n p SEP ( ` n p ) ³ n 1 ¡ 1 if 2 6 p 6 1 : p SEP ( ` n p ) The upper bound on in the above 1 6 p 6 1 ; equivalence is valid as stated for all
In [CCGGP98], Charikar-Chekuri-Goel-Guha-Plotkin asserted that ( 1 if 1 6 p 6 2 ; n p SEP ( ` n p ) ³ n 1 ¡ 1 if 2 6 p 6 1 : p SEP ( ` n p ) The upper bound on in the above 1 6 p 6 1 ; equivalence is valid as stated for all but we show here that the matching lower bound 2 < p 6 1 : is incorrect when
In [CCGGP98], Charikar-Chekuri-Goel-Guha-Plotkin asserted that ( 1 if 1 6 p 6 2 ; n p SEP ( ` n p ) ³ n 1 ¡ 1 if 2 6 p 6 1 : p SEP ( ` n p ) The upper bound on in the above 1 6 p 6 1 ; equivalence is valid as stated for all but we show here that the matching lower bound 2 < p 6 1 : is incorrect when Thus, in particular, we obtain an asymptotically better probabilistic ` n 1 : clustering of, say,
p 2 [2 ; 1 ] ; Theorem: For every p p ) . SEP ( ` n n min f p; log n g : In particular, the previous best known bound 1 ) . n SEP ( ` n p = 1 when was (and this was asserted in [CCGGP98] to be sharp), but here we show that actually p p n . SEP ( ` n 1 ) . n log n:
The source of the error in [CCGGP98] was that it relied on unpublished work of Indyk (1998) that was not published since then; we confirmed with Indyk as well as with some of the authors of [CCGGP98] that there is indeed a flaw in the (unpublished) work of Indyk that was cited. There is no flaw in the proof of [CCGGP98] in the p 2 [1 ; 2] ; range i.e., 1 ) SEP ( ` n p 2 [1 ; 2] = p ) ³ n p :
Refined probabilistic partitions for sparse or rapidly decaying vectors ( ` n n 2 N k 2 f 1 ;:::;n g p ) 6 k For and denote by R n the subset of consisting of all of those vectors with at most k nonzero entries, ` n equipped with the metric. p p > 1 Theorem: For every we have r ³ n ´ ¡ ¢ . k max f 1 2 g p ; 1 ( ` n SEP p ) 6 k log + min f p; log n g : k
p = 2 The special case becomes r ³ en ´ ¡ ¢ . ( ` n 8 k 2 f 1 ; : : : ; n g ; SEP 2 ) 6 k k log : k A curious aspect of this bound is that despite the fact that it is a statement about Euclidean geometry, our proof involves non-Euclidean geometric considerations. Specifically, the ubiquitous “iterative ball partitioning method” is ` n p = 1 + log( n=k ) : applied to balls in with p
Mixed-metric random partitions p 2 [1 ; 1 ] Theorem: For every and there ¢ > 0 R n exists a distribution over random partitions of P with the following properties. ¡ ¢ 6 ¢ : 8 x 2 R n ; diam ` n P ( x ) 1) p x;y 2 R n ; 2) For every p p 1 £ ¤ min f p; log n g . n P P ( x ) 6 = P ( y ) ¢ k x ¡ y k ` n 2 : ¢
p = 2 In particular, the special case shows that R n one can obtain a random partition of into ` n clusters of diameter at most yet with the ¢ 1 exponentially stronger Euclidean separation property p log n £ ¤ . 8 x; y 2 R n ; P P ( x ) 6 = P ( y ) ¢ k x ¡ y k ` n 2 : ¢
Iterative ball partitioning method Karger-Motwani-Sudan (1998), Charikar-Chekuri-Goel-Guha-Plotkin (1998), Calinescu-Karloff-Rabani (2001). Iteratively remove balls of radius centered ¢ = 2 at i.i.d. points in the normed space X . B X = f x 2 X : k x k X 6 1 g :
= x 1 + ¢ 2 B X :
= x 2 + ¢ 2 B X :
R n k ¢ k X Theorem: Let be a norm on and let P be the random partition that is obtained using iterative ball partitioning where the underlying ¢ = 2 k ¢ k X : balls are balls of radius in the norm ¡ ¢ 6 ¢ x 2 R n diam X P ( x ) Then (by design) for all x;y 2 R n and for every we have £ ¤ P P ( x ) 6 = P ( y ) ¡ ¢ vol n ¡ 1 Proj ( x ¡ y ) ? ( B X ) . ¢ k x ¡ y k ` n 2 : ¢vol n ( B X ) Sharp when the right hand side is < 1 (using SchmuckenschlÄ ager [1992]).
Extremal hyperplane projections The previously stated theorems about random ` n partitions of follow from this general p theorem in combination with the evaluation of the extremal volumes of hyperplane projections ` n of the unit ball of that were obtained by p Barthe-N. (2002).
Extremal hyperplane projections a 2 R n r f 0 g ; Theorem (Barthe-N., 2002): For every the following function is increasing in p . ¡ ¢ vol n ¡ 1 Proj a ? ( B ` n p ) p 7! : vol n ¡ 1 ( B ` n ¡ 1 ) p p > 2 ; When the above ratio attains its maximum a = (1 ; 1 ;:::; 1) : when
Recommend
More recommend