k variates
play

k -variates++: Poster #29, Mon. 3-7pm more pluses in the k -means++ - PowerPoint PPT Presentation

(formerly NICTA) k -variates++: Poster #29, Mon. 3-7pm more pluses in the k -means++ Richard Nock , Raphal Canyasse, Roksana Boreli, Frank Nielsen DATA61 | ANU | TECHNION | ECOLE POLYTECHNIQUE | UNSW | SONY CS LABS, INC. www.data61.csiro.au In


  1. (formerly NICTA) k -variates++: Poster #29, Mon. 3-7pm more pluses in the k -means++ Richard Nock , Raphaël Canyasse, Roksana Boreli, Frank Nielsen DATA61 | ANU | TECHNION | ECOLE POLYTECHNIQUE | UNSW | SONY CS LABS, INC. www.data61.csiro.au

  2. In this talk k -variates ❖ A generalization of the popular k -means++ seeding � ❖ Two theorems on k -variates++ � ❖ guarantees on approximation of the global optimum � ❖ likelihood ratio bound between neighbouring instances � ❖ Applications: “ reductions” between clustering algorithms + approximation bounds of new clustering algorithms, privacy 2 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  3. In this talk k -variates ❖ A generalization of the popular k -means++ seeding � ❖ Two theorems on k -variates++ � ! e r o m d n A ❖ guarantees on approximation of the global optimum � ) r e t s o p e e s ( ❖ likelihood ratio bound between neighbouring instances � ❖ Applications: “ reductions” between clustering algorithms + approximation bounds of new clustering algorithms, privacy 3 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  4. In this talk k -variates ❖ A generalization of the popular k -means++ seeding � ❖ Two theorems on k -variates++ � ! e r o m d n A ❖ guarantees on approximation of the global optimum � � ) r e t s o p e e s ( ❖ likelihood ratio bound between neighbouring instances � ) ! r e p a p e e s ( ❖ Applications: “ reductions” between clustering algorithms + approximation bounds of new clustering algorithms, privacy 4 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  5. 
 Motivation k -means++ seeding = a gold standard in ❖ clustering: � utterly simple to implement (iteratively ❖ pick centers squ. distance to previous ∼ centers) � assumption-free (expected) approximation ❖ guarantee wrt the k -means global optimum : 
 k -means++ E C [potential] ≤ (2 + log k ) · 8 φ opt distributed on-line (Arthur & Vassilvitskii, SODA 2007) � streamed ❖ Inspired many variants (tensor clustering, distributed, data stream, on-line, parallel no closed form centroid clustering, clustering without centroids in tensors closed form, etc.) more potentials 5 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  6. Motivation Approaches are spawns of k -means++: � ❖ modify the algorithm (e.g. ) � ❖ ∼ k -variates use it as building block � ❖ Our objective: � ❖ all in the same “bag”: a generalisation of ❖ k -means++ k -means++ from which such approaches distributed would be just “instanciations” 
 more applications reductions � ⇒ on-line Because general new applications ❖ ⇒ streamed no closed form centroid more potentials 6 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  7. k -means++ Arthur & Vassilvitskii, SODA’07 Input : data A ⇢ R d with | A | = m , k 2 N ⇤ ; Step 1: Initialise centers C ; ; Step 2: for t = 1 , 2 , ..., k . 2.1: randomly sample a ⇠ q t A , with q 1 = u m and, for t > 1, ! � 1 X . . x 2 C k a � x k 2 D t ( a 0 ) q t ( a ) = D t ( a ) , where D t ( a ) = min 2 ; a 0 2 A 2.2: x a ; 2.3: C C [ { x } ; Output : C ; 7 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  8. k -variates Input : data A ⇢ R d with | A | = m , k 2 N ⇤ , random variables { X a , a 2 A } , probe functions ℘ t : A ! R d ( t � 1); Step 1: Initialise centers C ; ; Step 2: for t = 1 , 2 , ..., k . 2.1: randomly sample a ⇠ q t A , with q 1 = u m and, for t > 1, ! � 1 X . . x 2 C k ℘ t ( a ) � x k 2 D t ( a 0 ) q t ( a ) = D t ( a ) , where D t ( a ) = min 2 ; a 0 2 A 2.2: randomly sample x ⇠ X a ; 2.3: C C [ { x } ; Output : C ; 8 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  9. Two theorems & applications 9 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  10. Theorem 1 l a b o l g f o n o i t a m i x o r p p a m u m t i p o ❖ k -means potential for : , with � c ∈ C k a � c k 2 a ∈ A k a � c ( a ) k 2 C φ ( A ; C ) = P c ( a ) = arg min . . 2 2 ( ≥ 0) ❖ Suppose is -stretching: for any optimal cluster with size > 1 A ℘ t η and any , 
 a 0 ∈ A φ ( A ; C ) φ ( ℘ t ( A ); C ) φ ( A ; { a 0 } ) ≤ (1 + η ) · φ ( ℘ t ( A ); { ℘ t ( a 0 ) } ) , ∀ t ❖ Then , with E C ∼ k − variates++ [ φ ( A ; C )] ≤ (2 + log k ) · Φ = (6 + 4 η ) φ opt + 2 φ bias + 2 φ var Φ . X k a � c opt ( a ) k 2 = φ opt . 2 a ∈ A X k E [ X a ] � c opt ( a ) k 2 = φ bias . 2 a ∈ A X . = tr (cov[ X a ]) φ var a ∈ A 10 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  11. Theorem 1 l a b o l g f o n o i t a m i x o r p p a m u m t i p o ❖ k -means potential for : , with � c ∈ C k a � c k 2 a ∈ A k a � c ( a ) k 2 C φ ( A ; C ) = P c ( a ) = arg min . . 2 2 ( ≥ 0) ❖ Suppose is -stretching: for any optimal cluster with size > 1 A ℘ t η and any , 
 a 0 ∈ A φ ( A ; C ) φ ( ℘ t ( A ); C ) φ ( A ; { a 0 } ) ≤ (1 + η ) · φ ( ℘ t ( A ); { ℘ t ( a 0 ) } ) , ∀ t ❖ Then , with E C ∼ k − variates++ [ φ ( A ; C )] ≤ (2 + log k ) · Φ = (6 + 4 η ) φ opt + 2 φ bias + 2 φ var Φ . k- means++: � X k a � c opt ( a ) k 2 = φ opt . 2 • probe = Id � a ∈ A • = Diracs X k E [ X a ] � c opt ( a ) k 2 = φ bias . X . 2 a ∈ A X . = tr (cov[ X a ]) φ var a ∈ A 11 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  12. Theorem 1 l a b o l g f o n o i t a m i x o r p p a m u m t i p o ❖ k -means potential for : , with � c ∈ C k a � c k 2 a ∈ A k a � c ( a ) k 2 C φ ( A ; C ) = P c ( a ) = arg min . . 2 2 ( ≥ 0) ❖ Suppose is -stretching: for any optimal cluster with size > 1 A ℘ t η and any , 
 a 0 ∈ A φ ( A ; C ) φ ( ℘ t ( A ); C ) φ ( A ; { a 0 } ) ≤ (1 + η ) · φ ( ℘ t ( A ); { ℘ t ( a 0 ) } ) , ∀ t ❖ Then , with E C ∼ k − variates++ [ φ ( A ; C )] ≤ (2 + log k ) · Φ = (6 + 4 η ) φ opt + 2 φ bias + 2 φ var Φ . k- means++: X k a � c opt ( a ) k 2 = φ opt . 2 φ bias = φ opt a ∈ A φ var = 0 X k E [ X a ] � c opt ( a ) k 2 = φ bias . 2 0 = a ∈ A η X . φ opt = tr (cov[ X a ]) φ var 8 = Φ ⇒ a ∈ A 12 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  13. 
 
 
 Remarks ❖ Guarantee approaches statistical lowerbound 
 (Fréchet-Cramér-Rao-Darmois) � ❖ Can be better than Arthur-Vassilvitskii bound, in particular if 
 φ bias < φ opt φ bias = knob from which background / domain knowledge may improve the general bound 
 13 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  14. Applications ❖ Reductions from k -variates++ approximability ratios � ⇒ ❖ pick clustering algorithm , � L ❖ show that expected output of = that of k -variates++ L for particular choices of and 
 X . ℘ t (note: no computational constraint, just need existence) � ❖ Get approximability ratio for ! 
 L 14 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  15. Summary (poster, paper) X . Setting Algorithm L Probe functions Densities ℘ t Batch k -means++ Identity Diracs Distributed d - k -means++ Identity Uniform, support = subsets p + d - k -means++ Distributed Identity Non uniform, compact support Streaming s - k -means++ synopses Diracs On-line ol - k -means++ point (batch not hit) Diracs / closest center (batch hit) 15 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

  16. Summary (poster, paper) X . Setting Algorithm L Probe functions Densities ℘ t Batch k -means++ Identity Diracs Distributed d - k -means++ Identity Uniform, support = subsets p + d - k -means++ Distributed Identity Non uniform, compact support Streaming s - k -means++ synopses Diracs On-line ol - k -means++ point (batch not hit) Diracs / closest center (batch hit) 16 k -variates++: more pluses in the k -means++ | Richard Nock , Raphael Canyasse, Roksana Boreli & Frank Nielsen ICML 2016

Recommend


More recommend