European Symposia on Algorithms16 Outline Problem Formulation - PowerPoint PPT Presentation

A Note On Spectral Clustering Pavel Kolev and Kurt Mehlhorn European Symposia on Algorithms‘16

Outline  Problem Formulation – Algorithmic Tools  Our Contribution – Structural Result – Algorithmic Result  Proof Overview  Summary

k-way Partitioning  Def. A cluster is a subset 𝑇 ⊆ 𝑊 with small conductance |𝐹 𝑇, 𝑇 | 𝜈(𝑇) , where the volume 𝜈 𝑇 = 𝑤∈𝑇 deg(𝑤) . 𝜚 𝑇 =

k-way Partitioning  Def. A cluster is a subset 𝑇 ⊆ 𝑊 with small conductance |𝐹 𝑇, 𝑇 | 𝜈(𝑇) , where the volume 𝜈 𝑇 = 𝑤∈𝑇 deg(𝑤) . 𝜚 𝑇 =  Def. The order 𝑙 conductance constant 𝜍(𝑙) = partition (𝑄 1 ,…,𝑄 𝑙 ) max min 𝑗∈[1:𝑙] 𝜚 𝑄 𝑗

k-way Partitioning  Def. A cluster is a subset 𝑇 ⊆ 𝑊 with small conductance |𝐹 𝑇, 𝑇 | 𝜈(𝑇) , where the volume 𝜈 𝑇 = 𝑤∈𝑇 deg(𝑤) . 𝜚 𝑇 =  Def. The order 𝑙 conductance constant 𝜍(𝑙) = partition (𝑄 1 ,…,𝑄 𝑙 ) max min 𝑗∈[1:𝑙] 𝜚 𝑄 𝑗  Goal: Find an approximate 𝑙 -way partition w.r.t 𝜍(𝑙) .

Standard Spectral Clustering Paradigm 𝐻 = 𝑊, 𝐹 , 3 ≤ 𝑙 ≪ 𝑜 and 𝜗 ∈ (0,1) . Input: Output: An approximate 𝑙 -way partition of 𝑊 . Andrew Ng et al [ NIPS’ 02]: 1. Computes an approximate Spectral Embedding 𝐺: 𝑊 ↦ 𝑆 𝑙 using the Power Method. 2) Run a 𝑙 -means clustering algorithm to compute an approximate 𝑙 -way partition of 𝐺 𝑤 𝑤∈𝑊 .

Spectral Graph Theory  The normalized Laplacian matrix ℒ has eigenvalues 0 = 𝜇 1 ≤ ⋯ ≤ 𝜇 𝑙 ≤ 𝜇 𝑙+1 ≤ ⋯ ≤ 𝜇 𝑜 ≤ 2 .  Fact. A graph has exactly 𝑙 connected component iff 0 = 𝜇 𝑙 < 𝜇 𝑙+1 .

Spectral Graph Theory  The normalized Laplacian matrix ℒ has eigenvalues 0 = 𝜇 1 ≤ ⋯ ≤ 𝜇 𝑙 ≤ 𝜇 𝑙+1 ≤ ⋯ ≤ 𝜇 𝑜 ≤ 2 .  Fact. A graph has exactly 𝑙 connected component iff 0 = 𝜇 𝑙 < 𝜇 𝑙+1 .  Trevisan et al. [STOC’12, SODA’14] proved a robust version 𝜇 𝑙 /2 ≤ 𝜍 𝑙 ≤ 𝑃 𝑙 3 𝜇 𝑙 . ( 𝜍 𝑙 is NP-hard and 𝜇 𝑙 is in P ) → approx . scheme!

Exact Spectral Embedding  𝑉 𝑙 = 𝑤 1 , 𝑤 2 , … , 𝑤 𝑙 ∈ 𝑆 𝑊×𝑙 - the bottom 𝑙 eigenvectors of ℒ 𝐺: 𝑊 ↦ 𝑆 𝑙  Normalized Spectral Embedding: 1 𝐺 𝑤 = deg(𝑤) 𝑉 𝑙 𝑤, : , for every 𝑤 ∈ 𝑊.

Approximate Spectral Embedding 𝑉 𝑙 ∈ 𝑆 𝑊×𝑙 approximation of the bottom 𝑙 eigenvectors of ℒ  𝐺: 𝑊 ↦ 𝑆 𝑙 Power Method  Approximate Normalized Spectral Embedding: 1 deg(𝑤) 𝐺 𝑤 = 𝑉 𝑙 𝑤, : , for every 𝑤 ∈ 𝑊.

Approximate Spectral Embedding 𝑉 𝑙 ∈ 𝑆 𝑊×𝑙 approximation of the bottom 𝑙 eigenvectors of ℒ  𝐺: 𝑊 ↦ 𝑆 𝑙 Power Method  Approximate Normalized Spectral Embedding: 𝒴 𝐹 = deg 𝑤 many copies of 𝐺 𝑤 𝑤 ∈ 𝑊} . Point Sets: 𝒴 𝑊 = 𝐺 𝑤 𝑤 ∈ 𝑊} .

𝑙 -means Clustering 𝒴 = 𝑞 1 , … , 𝑞 𝑜 with 𝑞 𝑗 ∈ 𝑆 𝑙 . Input: Output: 𝑙 -way partition of 𝒴 such that 𝑙 2 , ⋆ , … , 𝐵 𝑙 ⋆ 𝐵 1 = argmin 𝑞 − 𝑑 𝑗 partition 𝑌 1 ,…,𝑌 𝑙 of 𝒴 𝑗=1 𝑞∈𝑌 𝑗 where 𝑑 𝑗 is the center of 𝑌 𝑗 .

𝑙 -means Clustering 𝒴 = 𝑞 1 , … , 𝑞 𝑜 with 𝑞 𝑗 ∈ 𝑆 𝑙 . Input: Output: 𝑙 -way partition of 𝒴 such that 𝑙 2 , ⋆ , … , 𝐵 𝑙 ⋆ 𝐵 1 = argmin 𝑞 − 𝑑 𝑗 partition 𝑌 1 ,…,𝑌 𝑙 of 𝒴 𝑗=1 𝑞∈𝑌 𝑗 where 𝑑 𝑗 is the center of 𝑌 𝑗 . Def. The optimal 𝑙 -means cost is ⋆ . ⋆ , … , 𝐵 𝑙 Δ 𝑙 𝒴 = cost 𝐵 1

Structural Result  Peng et al. [COLT’15] Υ ≔ 𝜇 𝑙+1 /𝜍 𝑙 ≥ Ω(𝑙 3 ) 𝜍(𝑙) = max 𝑗∈[1:𝑙] 𝜚 𝑄 𝑗  Our Result 𝑙 𝜍 avr (𝑙) = 1 Ψ ≔ 𝜇 𝑙+1 /𝜍 avr (𝑙) ≥ Ω(𝑙 3 ) 𝑙 𝜚(𝑄 𝑗 ) 𝑗=1 - (𝑄 1 , … , 𝑄 𝑙 ) is an optimal k-way partition of 𝐻 w.r.t. 𝜍(𝑙) .

Structural Result  Peng et al. [COLT’15] Υ ≔ 𝜇 𝑙+1 /𝜍 𝑙 ≥ Ω(𝑙 3 ) 𝜍(𝑙) = max 𝑗∈[1:𝑙] 𝜚 𝑄 𝑗  Our Result 𝑙 𝜍 avr (𝑙) = 1 Ψ ≔ 𝜇 𝑙+1 /𝜍 avr (𝑙) ≥ Ω(𝑙 3 ) 𝑙 𝜚(𝑄 𝑗 ) 𝑗=1 - (𝑄 1 , … , 𝑄 𝑙 ) is an optimal k-way partition of 𝐻 w.r.t. 𝜍(𝑙) . - cost 𝐵 1 , … , 𝐵 𝑙 ≤ 𝛿 ⋅ Δ 𝑙 𝒴 𝐹 for 𝛿 ≥ 1 .

Structural Result 𝐵 𝑗 Δ𝑄  Peng et al. [COLT’15] 𝑗 If Υ ≔ 𝜇 𝑙+1 /𝜍 𝑙 ≥ Ω(𝑙 3 ) then 𝜍(𝑙) = max 𝑗∈[1:𝑙] 𝜚 𝑄 𝑗  𝜈 𝐵 𝑗 Δ𝑄 𝑗 ≤ ( 𝛿 /Υ) ⋅ 𝜈 𝑄 𝑗  Our Result 𝑙 𝜍 avr (𝑙) = 1 If Ψ ≔ 𝜇 𝑙+1 /𝜍 avr (𝑙) ≥ Ω(𝑙 3 ) then 𝑙 𝜚(𝑄 𝑗 )  𝜈 𝐵 𝑗 Δ𝑄 𝑗 ≤ ( 𝛿 /Ψ𝒍) ⋅ 𝜈 𝑄 𝑗=1 𝑗 - (𝑄 1 , … , 𝑄 𝑙 ) is an optimal k-way partition of 𝐻 w.r.t. 𝜍(𝑙) . - cost 𝐵 1 , … , 𝐵 𝑙 ≤ 𝛿 ⋅ Δ 𝑙 𝒴 𝐹 for 𝛿 ≥ 1 .

Structural Result  Peng et al. [COLT’15] If Υ ≔ 𝜇 𝑙+1 /𝜍 𝑙 ≥ Ω(𝑙 3 ) then 𝜍(𝑙) = max 𝑗∈[1:𝑙] 𝜚 𝑄 𝑗  𝜈 𝐵 𝑗 Δ𝑄 𝑗 ≤ ( 𝛿 /Υ) ⋅ 𝜈 𝑄 𝑗  𝜚 𝐵 𝑗 ≤ 1 + 𝛿 /Υ ⋅ 𝜚 𝑄 𝑗 + 𝛿 /Υ  Our Result 𝑙 𝜍 avr (𝑙) = 1 If Ψ ≔ 𝜇 𝑙+1 /𝜍 avr (𝑙) ≥ Ω(𝑙 3 ) then 𝑙 𝜚(𝑄 𝑗 )  𝜈 𝐵 𝑗 Δ𝑄 𝑗 ≤ ( 𝛿 /Ψ𝒍) ⋅ 𝜈 𝑄 𝑗=1 𝑗  𝜚 𝐵 𝑗 ≤ 1 + 𝛿 /Ψ𝒍 ⋅ 𝜚 𝑄 𝑗 + 𝛿 /Ψ𝒍 - (𝑄 1 , … , 𝑄 𝑙 ) is an optimal k-way partition of 𝐻 w.r.t. 𝜍(𝑙) . - cost 𝐵 1 , … , 𝐵 𝑙 ≤ 𝛿 ⋅ Δ 𝑙 𝒴 𝐹 for 𝛿 ≥ 1 .

Structural Result  Peng et al. [COLT’15] If Υ ≔ 𝜇 𝑙+1 /𝜍 𝑙 ≥ Ω(𝑙 3 ) then 𝜍(𝑙) = max 𝑗∈[1:𝑙] 𝜚 𝑄 𝑗  𝜈 𝐵 𝑗 Δ𝑄 𝑗 ≤ ( 𝛿 /Υ) ⋅ 𝜈 𝑄 𝑗  𝜚 𝐵 𝑗 ≤ 1 + 𝛿 /Υ ⋅ 𝜚 𝑄 𝑗 + 𝛿 /Υ  Our Result 𝑙 𝜍 avr (𝑙) = 1 If Ψ ≔ 𝜇 𝑙+1 /𝜍 avr (𝑙) ≥ Ω(𝑙 3 ) then 𝑙 𝜚(𝑄 𝑗 )  𝜈 𝐵 𝑗 Δ𝑄 𝑗 ≤ ( 𝛿 /Ψ𝒍) ⋅ 𝜈 𝑄 𝑗=1 𝑗  𝜚 𝐵 𝑗 ≤ 1 + 𝛿 /Ψ𝒍 ⋅ 𝜚 𝑄 𝑗 + 𝛿 /Ψ𝒍 How to find such 𝑙 -way partition 𝐵 1 , … , 𝐵 𝑙 ?

Algorithmic Result  Peng et al. [COLT’15] Concentration Υ ≔ 𝜇 𝑙+1 /𝜍 𝑙 ≥ Ω(𝒍 𝟔 ) more restrictive by Heat Kernel and Ω 𝑙 2 -factor Local Sensitive Hashing

Algorithmic Result  Peng et al. [COLT’15] Concentration Υ ≔ 𝜇 𝑙+1 /𝜍 𝑙 ≥ Ω(𝒍 𝟔 ) more restrictive by Heat Kernel and Ω 𝑙 2 -factor Local Sensitive Hashing  Our Result Approx. Spectral Embedding Ψ ≔ 𝜇 𝑙+1 /𝜍 avr (𝑙) ≥ Ω(𝑙 3 ) and k-means Clustering and ∆ 𝑙 𝒴 𝑊 ≥ 𝑜 −𝑃(1)

Algorithmic Result  Peng et al. [COLT’15] Concentration Υ ≔ 𝜇 𝑙+1 /𝜍 𝑙 ≥ Ω(𝒍 𝟔 ) more restrictive by Heat Kernel and Ω 𝑙 2 -factor Local Sensitive Hashing  Our Result Approx. Spectral Embedding Ψ ≔ 𝜇 𝑙+1 /𝜍 avr (𝑙) ≥ Ω(𝑙 3 ) and k-means Clustering and ∆ 𝑙 𝒴 𝑊 ≥ 𝑜 −𝑃(1) This is the 1 st rigorous algorithmic analysis of the Standard Spectral Clustering Paradigm!

Algorithmic Result  Peng et al. [COLT’15] Concentration Υ ≔ 𝜇 𝑙+1 /𝜍 𝑙 ≥ Ω(𝒍 𝟔 ) constant = 10 5 Heat Kernel and Local Sensitive Hashing  Our Result Approx. Spectral Embedding Ψ ≔ 𝜇 𝑙+1 /𝜍 avr (𝑙) ≥ Ω(𝑙 3 ) and k-means Clustering and ∆ 𝑙 𝒴 𝑊 ≥ 𝑜 −𝑃(1) 𝜗 0 = 6/10 7 is Ostrovsky et al’s [FOCS’13] constant = 10 7 /𝜗 0 k-means alg. constant (is not optimized!)

European Symposia on Algorithms16 Outline Problem Formulation - PowerPoint PPT Presentation

A Note On Spectral Clustering Pavel Kolev and Kurt Mehlhorn European Symposia on Algorithms16 Outline Problem Formulation Algorithmic Tools Our Contribution Structural Result Algorithmic Result Proof Overview Summary

Major Major Topic Symposia: opic Symposia: FAITH JOURNEYS of VETERANS & MILITARY

Propose'a'symposia'and'contribute'to' ISoP'2018'Annual'Mee;ngs'success' !

2020 Symposia Series 2 2020 Symposia Series 2 Improving Glycemic and Cardiorenal Outcomes in

2020 Symposia Series 2 2020 Symposia Series 2 Improving Glycemic and Cardiorenal Outcomes in

Highlights from Five Years of UW STARTALK Heritage Language Symposia Dr. Michele Anciaux Aoki,

SCCER SYMPOSIA Sandra Moebus Rapperswil, November 6th 2018 Aim of this talk Contribution

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Sunday, 2 April 2017 DAILY SCIENTIFIC PROGRAM SYMPOSIA & ORAL PRESENTATIONS 07:00 - 08:30

Saturday, 1 April 2017 DAILY SCIENTIFIC PROGRAM SYMPOSIA & ORAL PRESENTATIONS 07:00 - 08:30

Monday, 3 April 2017 DAILY SCIENTIFIC PROGRAM SYMPOSIA & ORAL PRESENTATIONS 07:00 - 08:30

Colorado Rare Plant Symposia USFS Sensitive Species of Colorado Forest Service Status Species

Instructions for Oral (Symposia and OP) Presentations at the 56 th EASD Annual Meeting 2020 Please

2020 Symposia Series 1 New Options for a Common Condition: Management of Dry Eye Disease

2020 Symposia Series 1 The Pivotal Role of Primary Care Clinicians in the Management of Heart

With Podman By Dan Walsh @rhatdan dnf install -y podman dnf install -y podman alias

Understanding SSH: Large-scale measurements and notary-based authentication Final Presentation

Containers, Docker, and Security: State of the Union 1 / 38 Who am I? Jrme Petazzoni

PAST : Probabilistic Authentication of Sensor Timestamps Ashish Gehani SRI 1 INTRODUCTION :

MATH 2110Q Practice Exam 3 1 exam3ReviewAnswers.notebook November 03, 2014 2

Jupyter and Spark on Mesos: Best Practices June 21 st , 2017 Agenda About me What is

A Real-World Noisy Unstructured Handwritten Notebook Corpus for Document Image Analysis Research

Geographic Data Science - Lecture I Introduction Dani Arribas-Bel Today This course The

European Symposia on Algorithms16 Outline Problem Formulation - PowerPoint PPT Presentation

A Note On Spectral Clustering Pavel Kolev and Kurt Mehlhorn European Symposia on Algorithms16 Outline Problem Formulation Algorithmic Tools Our Contribution Structural Result Algorithmic Result Proof Overview Summary

Major Major Topic Symposia: opic Symposia: FAITH JOURNEYS of VETERANS &amp; MILITARY

Propose'a'symposia'and'contribute'to' ISoP'2018'Annual'Mee;ngs'success' !

2020 Symposia Series 2 2020 Symposia Series 2 Improving Glycemic and Cardiorenal Outcomes in

2020 Symposia Series 2 2020 Symposia Series 2 Improving Glycemic and Cardiorenal Outcomes in

Highlights from Five Years of UW STARTALK Heritage Language Symposia Dr. Michele Anciaux Aoki,

SCCER SYMPOSIA Sandra Moebus Rapperswil, November 6th 2018 Aim of this talk Contribution

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Sunday, 2 April 2017 DAILY SCIENTIFIC PROGRAM SYMPOSIA &amp; ORAL PRESENTATIONS 07:00 - 08:30

Saturday, 1 April 2017 DAILY SCIENTIFIC PROGRAM SYMPOSIA &amp; ORAL PRESENTATIONS 07:00 - 08:30

Monday, 3 April 2017 DAILY SCIENTIFIC PROGRAM SYMPOSIA &amp; ORAL PRESENTATIONS 07:00 - 08:30

Colorado Rare Plant Symposia USFS Sensitive Species of Colorado Forest Service Status Species

Instructions for Oral (Symposia and OP) Presentations at the 56 th EASD Annual Meeting 2020 Please

2020 Symposia Series 1 New Options for a Common Condition: Management of Dry Eye Disease

2020 Symposia Series 1 The Pivotal Role of Primary Care Clinicians in the Management of Heart

With Podman By Dan Walsh @rhatdan dnf install -y podman dnf install -y podman alias

Understanding SSH: Large-scale measurements and notary-based authentication Final Presentation

Containers, Docker, and Security: State of the Union 1 / 38 Who am I? Jrme Petazzoni

PAST : Probabilistic Authentication of Sensor Timestamps Ashish Gehani SRI 1 INTRODUCTION :

MATH 2110Q Practice Exam 3 1 exam3ReviewAnswers.notebook November 03, 2014 2

Jupyter and Spark on Mesos: Best Practices June 21 st , 2017 Agenda About me What is

A Real-World Noisy Unstructured Handwritten Notebook Corpus for Document Image Analysis Research

Geographic Data Science - Lecture I Introduction Dani Arribas-Bel Today This course The

Major Major Topic Symposia: opic Symposia: FAITH JOURNEYS of VETERANS & MILITARY

Sunday, 2 April 2017 DAILY SCIENTIFIC PROGRAM SYMPOSIA & ORAL PRESENTATIONS 07:00 - 08:30

Saturday, 1 April 2017 DAILY SCIENTIFIC PROGRAM SYMPOSIA & ORAL PRESENTATIONS 07:00 - 08:30

Monday, 3 April 2017 DAILY SCIENTIFIC PROGRAM SYMPOSIA & ORAL PRESENTATIONS 07:00 - 08:30