clustering shrinkage l 0 and staircases
play

Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. - PowerPoint PPT Presentation

Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. SUYKENS, B. DE MOOR NIPS workshop on theoretical foundations of clustering December 2005 KULeuven - Department of Electrical Engineering - SCD/SISTA Kasteelpark Arenberg 10, 3001


  1. Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. SUYKENS, B. DE MOOR NIPS workshop on theoretical foundations of clustering December 2005 KULeuven - Department of Electrical Engineering - SCD/SISTA Kasteelpark Arenberg 10, 3001 Heverlee (Leuven), Belgium Kristiaan.Pelckmans@esat.kuleuven.ac.be K. PELCKMANS K.U.Leuven - SCD/SISTA

  2. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D Empirical CCS � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  3. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � � • Convex complexity measure ℓ : R D → R + Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  4. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � � • Convex complexity measure ℓ : R D → R + Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  5. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � � • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  6. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � � γ = 10 • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  7. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � γ = 10000 � γ = 10 • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  8. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � γ = 10000 � γ = 10 • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS Convex Programming Problem: N J γ ( M i ) = 1 X � x i − M i � p min 2 Mi i =1 K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  9. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � γ = 10000 � γ = 10 • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS Convex Programming Problem: N J γ ( M i ) = 1 X X � x i − M i � p + γ ℓ ( M i − M j ) min 2 Mi i =1 i<j → Pelckmans et al. , Convex Clustering Shrinkage, PASCAL workshop 2005 K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  10. • γ = 0 : M i = X i • γ → + ∞ : M 1 = · · · = M N = ¯ X • ℓ = | · | 1 • Ranging γ , Empirical CCS increasing number of sparse differences � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 2/6

  11. • γ = 0 : M i = X i • γ → + ∞ : M 1 = · · · = M N = ¯ X • ℓ = | · | 1 • Ranging γ , Empirical CCS increasing number of sparse differences � Theoretical CCS m(X) �x x X K. PELCKMANS K.U.Leuven - SCD/SISTA 2/6

  12. • γ = 0 : M i = X i • γ → + ∞ : M 1 = · · · = M N = ¯ X • ℓ = | · | 1 • Ranging γ , Empirical CCS increasing number of sparse differences � Theoretical CCS m(X) �x x X K. PELCKMANS K.U.Leuven - SCD/SISTA 2/6

  13. • γ = 0 : M i = X i • γ → + ∞ : M 1 = · · · = M N = ¯ X • ℓ = | · | 1 • Ranging γ , Empirical CCS increasing number of sparse differences � Theoretical CCS m(X) Univariate x i ∈ R M i → Discrete �x m ( x i ) → Continuous m(x)=m(x’) x’ x x X K. PELCKMANS K.U.Leuven - SCD/SISTA 2/6

  14. Clustering Shrinkage (Ct’d) Modifications: • 0 -norm (count different pairs) → non-convex but interpretability! • ǫ -neighborhood: B ( ǫ ) ball with measure | B ( ǫ ) | Empirical CCS � J ǫ,p m ǫ = arg min ˆ γ ( m ) Theoretical CCS m : R D → R D N N = 1 γ X X X � m ( x i ) − x i � p + I ( � m ( x i ) − m ( x j ) � > 0) , | B ( ǫ ) | p i =1 i =1 , � xi − xj �≤ ǫ (1) → the second term measures the density of different assigned datapoints in a local neighborhood (cfr. histogram density estimator). K. PELCKMANS K.U.Leuven - SCD/SISTA 3/6

  15. Clustering Shrinkage (Ct’d) Definition 1. [Theoretical Shrinkage Clustering] Let m : R → R be such m ( x − δ ) − m ( x + δ ) that lim � δ �→ 0 exists almost everywhere. Let the cdf P ( x ) | B ( � δ � ) | underlying the dataset be known and assume its pdf p ( x ) exists everywhere and is nonzero on a connected compact interval C ⊂ R with nonzero measure | C | > 0 . We will study the following theoretical counterpart to (1) Z Z Empirical CCS J p, 0 ‚ m ′ ( x ) ‚ ‚ ‚ ‚ m = arg min ˆ ( m ) = ‚ m ( x ) − x p dP ( x ) + γ 0 dP ( x ) , ‚ ‚ γ m : R → R C C � (2) where we define the latter term -denoted further as the zero-norm variation- formally Theoretical CCS as follows „ I ( m ( B ( x ; ǫ )) � = const ) « ‚ m ′ ( x ) ‚ ‚ 0 � lim (3) , ‚ | B ( x, ǫ ) | ǫ → 0 ` ´ with the characteristic function I m ( B ( x ; ǫ )) � = const equals one if ∃ y ∈ B ( x ; ǫ ) such that � m ( x ) − m ( y ) � > 0 ( B ( x, ǫ ) contains parts of different clusters), and equal to zero otherwise. K. PELCKMANS K.U.Leuven - SCD/SISTA 4/6

  16. Clustering Shrinkage (Ct’d) Theorem 1. [Univariate Staircase Representation] When P ( x ) is a fixed, smooth and differentiable distribution function with pdf p : R → R + which is nonzero on a compact interval C ⊂ R , the minimizer to (2) takes the form of a staircase function uniquely defined on C with a finite number of positive steps (say K < + ∞ ) of size a = ( a 1 , . . . , a K ) T ∈ R K at the points D ( K ) = { x ( k ) } K k =1 ⊂ C Empirical CCS K X ` ´ ` ´ ˆ x ; a, D ( K ) = s.t. a k ≥ 0 , x ( k ) ∈ C ∀ k (4) m a k I x > x ( k ) � k =1 Theoretical CCS Moreover, the optimization problem (2) is equivalent to the problem ‚ ‚ K K Z ‚ ‚ X X J p ` ´ ` ´ a, D ( K ) − x min = a k I x > x ( k ) p ( x ) dx + p ( x ( k ) ) , ‚ ‚ K ‚ ‚ a, D ( K ) C ‚ ‚ k =1 k =1 p (5) where K ∈ N relates to γ ∈ R + in a way depending on D . K. PELCKMANS K.U.Leuven - SCD/SISTA 5/6

  17. Interpretations Unifying perspective: • Vector Quantization ( k -means) Empirical CCS � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

  18. Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut Empirical CCS � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

  19. Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut • Optimal coding: ”finding a short code for X that preserves the maximum Empirical CCS information about X itself.” L 2 → KL � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

  20. Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut • Optimal coding: ”finding a short code for X that preserves the maximum Empirical CCS information about X itself.” L 2 → KL � • Optimal bin placement Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

  21. Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut • Optimal coding: ”finding a short code for X that preserves the maximum Empirical CCS information about X itself.” L 2 → KL � • Optimal bin placement Theoretical CCS Main message: • Optimization view to clustering K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

  22. Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut • Optimal coding: ”finding a short code for X that preserves the maximum Empirical CCS information about X itself.” L 2 → KL � • Optimal bin placement Theoretical CCS Main message: • Optimization view to clustering • Clustering → study of the class of staircases (cfr. classification). K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

Recommend


More recommend