Exact Recovery for a Family of Community-Detection Generative Models Paolo Penna joint work with Joachim Buhmann, Luca Corinzia, Luca Mondada ISE Group, Inst for Machine Learning, ETH Zurich 1
A Toy Problem 2 - 1
A Toy Problem pick random triangle 2 - 2
A Toy Problem pick random triangle add noise 2 - 3
A Toy Problem pick random triangle add noise 2 - 4
A Toy Problem pick random triangle add noise Find the triangle? 2 - 5
A Toy Problem pick random triangle 1 0 1 1 Find the triangle? 3 - 1
A Toy Problem ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) pick random triangle 1 0 1 add noise 1 ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) Find the triangle? 3 - 2
A Toy Problem ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) pick random triangle µ 0 µ add noise µ ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) Find the triangle? 3 - 3
A Toy Problem ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) ∼ N ( µ , σ 2 ) pick random triangle 0 add noise ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) Find the triangle? 3 - 4
A Toy Problem ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) ∼ N ( µ , σ 2 ) pick random triangle 0 add noise ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) Return heaviest triangle Find the triangle? 3 - 5
A Toy Problem ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) ∼ N ( µ , σ 2 ) pick random triangle 0 add noise ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) signal vs noise Return heaviest triangle Find the triangle? 3 - 6
A Toy Problem ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) ∼ N ( µ , σ 2 ) pick random triangle 0 add noise ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) signal vs noise Return heaviest triangle Find the triangle? 3 - 7
A Toy Problem ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) ∼ N ( µ , σ 2 ) pick random triangle 0 add noise ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) random graph model Return heaviest triangle Find the triangle? 3 - 8
Flavor of the Problem random noise planted solution planted random model Recover planted solution? 4 - 1
Flavor of the Problem random noise planted solution Many variants: planted clique,planted bisection, stochastic block model,... planted random model Recover planted solution? 4 - 2
Triangles ⇓ Our General Model 5
Generalization #1 ⇒ = k 6
Planted Random Models community k Random Graph 7 - 1
Planted Random Models k Weighted Random Graph Stochastic Block Model 7 - 2
Planted Random Models k Weighted Random Graph Stochastic Block Model Densest k -Subgraph Problem 7 - 3
Generalization #2 ⇒ = h 8
Our Model N nodes k (solution) h (hyperedge) 9
The Simplest Model · · · M 1 2 Random Energy Model (REM) 10 - 1
The Simplest Model solutions · · · M 1 2 Random Energy Model (REM) 10 - 2
The Simplest Model ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) · · · M 1 2 Random Energy Model (REM) 10 - 3
The Simplest Model independent ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) · · · M 1 2 Random Energy Model (REM) 10 - 4
The Simplest Model ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) · · · M 1 2 Planted Random Energy Model (REM) (P-REM) 10 - 5
The Simplest Model ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) · · · M 1 2 Planted Random Energy Model (REM) (P-REM) Return max weight one Recover planted solution? 10 - 6
The Simplest Model ∼ N (0, σ 2 ) ∼ N (0, σ 2 ) ∼ N ( µ , σ 2 ) ∼ N (0, σ 2 ) · · · M 1 2 Planted Random Energy Model (REM) (P-REM) Maximum Likelihood (ML) Return max weight one Recover planted solution? 10 - 7
“Simple vs Hard” · · · M 1 2 P-REM Random Graph 11 - 1
“Simple vs Hard” · · · � N � = M 1 2 k P-REM Random Graph 11 - 2
“Simple vs Hard” · · · � N � = M 1 2 k P-REM Random Graph 11 - 3
“Simple vs Hard” independent · · · � N � = M 1 2 dependent k P-REM Random Graph 11 - 4
“Simple vs Hard” independent · · · � N � = M 1 2 dependent k P-REM Random Graph search is hard search maybe easier 11 - 5
Our Contribution 12
· · · M 1 2 P-REM Recover planted solution? Signal to Noise Ratio = ^ µ ^ σ fail success 1 13 - 1
M → ∞ · · · M 1 2 P-REM Recover planted solution? Signal to Noise Ratio = ^ µ ^ σ P ( success ) → 0 P ( success ) → 1 fail success 1 13 - 2
Overview of Contribution N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM 14 - 1
Overview of Contribution N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM h = k 14 - 2
Overview of Contribution N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM h = k · · · M 1 2 14 - 3
Overview of Contribution N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM h = 1 · · · k-planted REM 14 - 4
Overview of Contribution N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM · · · M 1 2 P-REM success fail 1 14 - 5
Overview of Contribution N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM · · · ⇒ M = 1 2 P-REM success success fail fail γ fail γ succ 1 Technique (”reduction”) 14 - 6
Bounds fail success γ fail γ succ Model γ fail γ succ h 1 k -P-REM 1 1 � � 1 1 2 Graph 2 k − 1 k − 1 � � 1 h 2 < h < k Hypergraph 2 ( k − 1 ( k − 1 h − 1 ) h − 1 ) P-REM 1 1 k k = o (log N ) 15 - 1
Bounds fail success γ fail γ succ Model γ fail γ succ h 1 k -P-REM 1 1 � � 1 1 2 Graph 2 k − 1 k − 1 � � 1 h 2 < h < k Hypergraph 2 ( k − 1 ( k − 1 h − 1 ) h − 1 ) P-REM 1 1 k k = o (log N ) 15 - 2
Bounds fail success γ fail γ succ Model γ fail γ succ h 1 k -P-REM 1 1 � � 1 1 2 Graph 2 k − 1 k − 1 � � 1 h 2 < h < k Hypergraph 2 ( k − 1 ( k − 1 h − 1 ) h − 1 ) P-REM 1 1 k k = o (log N ) 15 - 3
Bounds fail success γ fail γ succ Model γ fail γ succ h 1 k -P-REM 1 1 � � 1 1 2 Graph 2 k − 1 k − 1 � � 1 h 2 < h < k Hypergraph 2 ( k − 1 ( k − 1 h − 1 ) h − 1 ) P-REM 1 1 k k = o (log N ) k k 15 - 4
Bounds fail success γ fail γ succ Model γ fail γ succ h 1 k -P-REM 1 1 � � 1 1 2 Graph 2 k − 1 k − 1 � � 1 h 2 < h < k Hypergraph 2 ( k − 1 ( k − 1 h − 1 ) h − 1 ) P-REM 1 1 k k = o (log N ) k k recovery easier 15 - 5
Bounds fail success γ fail γ succ Model γ fail γ succ h 1 k -P-REM 1 1 � � 1 1 2 Graph 2 k − 1 k − 1 � � 1 h 2 < h < k Hypergraph 2 ( k − 1 ( k − 1 h − 1 ) h − 1 ) P-REM 1 1 k k = o (log N ) maximum likelihood (ML) Recover planted solution? 15 - 6
Connections to Other Works p q 16 - 1
Connections to Other Works p =1 Bernulli q =1/2 Planted Clique √ k = N k = Θ (log N ) A nearly tight sum-of-squares lower bound for the planted clique problem (Barak et al., FOCS’16). . . . 16 - 2
Connections to Other Works p Bernulli k = N/2 q Bisection 16 - 3
Connections to Other Works p Bernulli q Stochastic Block Model “Exact recovery in the stochastic block model” (Abbe, , Bandeira, and Hall, IEEE Transactions on Information Theory ’16) , “Consistency thresholds for the planted bisection model” (Mossel, Neeman, and Sly, STOC 15) “Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery,” (E. Abbe and C. Sandon, FOCS ’15) 16 - 4
Connections to Other Works p Generic distr. k = N / c q Weighted Stochastic Block Model “Information-theoretic bounds for exact recovery in weighted stochastic block models using the renyi divergence” (Jog and Loh, arXiv 2015) 16 - 5
Connections to Other Works p Bernulli q Hypergraph Stochastic Block Model “Consistency of spectral partitioning of uniform hypergraphs under planted partition model” (Ghoshdastidar and Dukkipati, NIPS ’14) “Consistency of spectral hypergraph partitioning under planted partition model,” (Ghoshdastidar et al, The Annals of Statistics ’17) 16 - 6
Connections to Other Works p Gaussian k = N / 2 q Weighted Hypergraph Stochastic Block Model ”Community detection in hypergraphs, spiked tensor models, and Sum-of-Squares” (Chiheon Bandeira, Goemans, Int. Conf. on Sampling Theory and Applications, ’17) 16 - 7
Open Questions Computational Aspects - trade off (dependency, recoverability, hardness) Other Problems - our technique (“reduce” to REM) 17
Thank You!! 18
Recommend
More recommend