The condensation threshold in stochastic block models Joe Neeman (with Jess Banks, Cris Moore, Praneeth Netrapalli) Austin, May 9, 2016 1
Stochastic block model G ( n , k , a , b ) 1. n nodes, k colors, about n / k nodes of each color a if the same color n 2. connect u to v with probability b if different colors n 2
max where max is over all permutations on 1 k . Definition Let 1 k be the color of v . For another coloring , v v V 1 v v O lap n k Definition G n n k a b is detectable if there exists 0 and n maps A n graphs labellings such that lim inf Pr O lap n A n G n n Otherwise it is undetectable . Problem I: detecting Given the (uncolored) graph, recover the colors (up to permutation) better than a random guess. 3
max where max is over all permutations on 1 k . Definition G n n k a b is detectable if there exists 0 and n maps A n graphs labellings such that lim inf Pr O lap n A n G n n Otherwise it is undetectable . Problem I: detecting Given the (uncolored) graph, recover the colors (up to permutation) better than a random guess. Definition Let σ v ∈ { 1 , . . . , k } be the color of v . For another coloring τ , # { v ∈ V : σ v = − 1 τ v } O lap ( σ, τ ) = k , n 3
Definition G n n k a b is detectable if there exists 0 and n maps A n graphs labellings such that lim inf Pr O lap n A n G n n Otherwise it is undetectable . Problem I: detecting Given the (uncolored) graph, recover the colors (up to permutation) better than a random guess. Definition Let σ v ∈ { 1 , . . . , k } be the color of v . For another coloring τ , # { v ∈ V : σ v = π ( τ v ) } − 1 O lap ( σ, τ ) = max k , n π where max is over all permutations π on { 1 , . . . , k } . 3
Problem I: detecting Given the (uncolored) graph, recover the colors (up to permutation) better than a random guess. Definition Let σ v ∈ { 1 , . . . , k } be the color of v . For another coloring τ , # { v ∈ V : σ v = π ( τ v ) } − 1 O lap ( σ, τ ) = max k , n π where max is over all permutations π on { 1 , . . . , k } . Definition ( G n , σ n ) ∼ G ( n , k , a , b ) is detectable if there exists ϵ > 0 and maps A n : { graphs } → { labellings } such that lim inf n →∞ Pr ( O lap ( σ n , A n ( G n )) > ϵ ) > ϵ. 3 Otherwise it is undetectable .
Definition Sequences n and n of probability measures are • contiguous if n A n 0 iff n A n 0 • orthogonal if A n with n A n 0 and n A n 1. Say that n k a b is n d • distinguishable if it is orthogonal to n n d • indistinguishable if it is contiguous with n Problem II: distinguishing Given the (uncolored) graph, did it come from G ( n , k , a , b ) or n ) , where d = a +( k − 1 ) b G ( n , d ? k 4
Say that n k a b is n d • distinguishable if it is orthogonal to n n d • indistinguishable if it is contiguous with n Problem II: distinguishing Given the (uncolored) graph, did it come from G ( n , k , a , b ) or n ) , where d = a +( k − 1 ) b G ( n , d ? k Definition Sequences P n and Q n of probability measures are • contiguous if P n ( A n ) → 0 iff Q n ( A n ) → 0 • orthogonal if ∃ A n with P n ( A n ) → 0 and Q n ( A n ) → 1. 4
Problem II: distinguishing Given the (uncolored) graph, did it come from G ( n , k , a , b ) or n ) , where d = a +( k − 1 ) b G ( n , d ? k Definition Sequences P n and Q n of probability measures are • contiguous if P n ( A n ) → 0 iff Q n ( A n ) → 0 • orthogonal if ∃ A n with P n ( A n ) → 0 and Q n ( A n ) → 1. Say that G ( n , k , a , b ) is • distinguishable if it is orthogonal to G ( n , d n ) • indistinguishable if it is contiguous with G ( n , d n ) 4
Better parametrization a • n = within-block edge probability b • n = between-block edge probability • k = number of blocks d = a + ( k − 1 ) b k a − b λ = a + ( k − 1 ) b 1 Note λ ∈ k − 1 , 1 . [ ] − 5
Phase diagram for k = 2 1 . 0 detectable, distinguishable 0 . 5 λ 2 d = 1 λ 0 . 0 undetectable, indistinguishable − 0 . 5 − 1 . 0 0 5 10 15 20 d (Mossel/N/Sly, Massoulié) 6
Conjectured phase diagram for k = 20 1 . 0 0 . 8 detectable, distinguishable 0 . 6 λ 0 . 4 λ 2 d = 1 0 . 2 detectable but hard, distinguishable undetectable, indistinguishable 0 . 0 0 200 400 600 800 1000 d (Decelle, Krzakala, Moore, Zdeborova) 7
What we know for k = 20 1 . 0 0 . 8 detectable (quickly), distinguishable (Bordenave/Lelarge/Massouli´ e, Abbe/Sandon) 0 . 6 λ 0 . 4 0 . 2 detectable, distinguishable (Abbe/Sandon, this work) undetectable, indistinguishable (this work) 0 . 0 0 200 400 600 800 1000 d 8
1 If k is large enough then there are such that d 2 , giving the yellow region. 2 d a b lim where d 1 log 1 d k d If 1 and lim k 1 (planted coloring / giant) d Theorem (Banks/Moore/N/Netrapalli) 2 k log k d + = ( 1 + ( k − 1 ) λ ) log ( 1 + ( k − 1 ) λ ) + ( k − 1 )( 1 − λ ) log ( 1 − λ ) d − = 2 log ( k − 1 ) λ 2 ( k − 1 ) • d > d + implies detectability, distinguishability. • d < d − implies undetectability, indistinguishability. 9
2 d a b lim where d 1 log 1 d k d If 1 and lim k 1 (planted coloring / giant) d Theorem (Banks/Moore/N/Netrapalli) 2 k log k d + = ( 1 + ( k − 1 ) λ ) log ( 1 + ( k − 1 ) λ ) + ( k − 1 )( 1 − λ ) log ( 1 − λ ) d − = 2 log ( k − 1 ) λ 2 ( k − 1 ) • d > d + implies detectability, distinguishability. • d < d − implies undetectability, indistinguishability. If k is large enough then there are λ such that d + < 1 λ 2 , giving the yellow region. 9
Theorem (Banks/Moore/N/Netrapalli) 2 k log k d + = ( 1 + ( k − 1 ) λ ) log ( 1 + ( k − 1 ) λ ) + ( k − 1 )( 1 − λ ) log ( 1 − λ ) d − = 2 log ( k − 1 ) λ 2 ( k − 1 ) • d > d + implies detectability, distinguishability. • d < d − implies undetectability, indistinguishability. If k is large enough then there are λ such that d + < 1 λ 2 , giving the yellow region. µ 2 d + ( 1 + µ ) log ( 1 + µ ) − µ where µ = a − b lim d − = . d k →∞ d + If µ ≈ ± 1 and lim k →∞ d − ≈ 1 (planted coloring / giant) 9
The proofs 1 . 0 0 . 8 detectable (quickly), distinguishable (Bordenave/Lelarge/Massouli´ e, Abbe/Sandon) 0 . 6 λ 0 . 4 0 . 2 detectable, distinguishable (Abbe/Sandon, this work) undetectable, indistinguishable (this work) 0 . 0 0 200 400 600 800 1000 d 10
For suitable a b k , w.h.p. • n k a b : all good partitions are correlated with the truth. n d • n : there are no good partitions. Proof: concentration + union bound. Distinguishing: check if there is a good partition. Detecting: find a good partition. Abbe/Sandon improved this for small d by taking the giant component and pruning trees. Detecting/distinguishing inefficiently Consider partitions of G into k equal parts. A partition is good if its average in-degree is ≈ a k and its average out-degree is ≈ ( k − 1 ) b . k 11
n d • n : there are no good partitions. Proof: concentration + union bound. Distinguishing: check if there is a good partition. Detecting: find a good partition. Abbe/Sandon improved this for small d by taking the giant component and pruning trees. Detecting/distinguishing inefficiently Consider partitions of G into k equal parts. A partition is good if its average in-degree is ≈ a k and its average out-degree is ≈ ( k − 1 ) b . k For suitable a , b , k , w.h.p. • G ( n , k , a , b ) : all good partitions are correlated with the truth. 11
Proof: concentration + union bound. Distinguishing: check if there is a good partition. Detecting: find a good partition. Abbe/Sandon improved this for small d by taking the giant component and pruning trees. Detecting/distinguishing inefficiently Consider partitions of G into k equal parts. A partition is good if its average in-degree is ≈ a k and its average out-degree is ≈ ( k − 1 ) b . k For suitable a , b , k , w.h.p. • G ( n , k , a , b ) : all good partitions are correlated with the truth. • G ( n , d n ) : there are no good partitions. 11
Distinguishing: check if there is a good partition. Detecting: find a good partition. Abbe/Sandon improved this for small d by taking the giant component and pruning trees. Detecting/distinguishing inefficiently Consider partitions of G into k equal parts. A partition is good if its average in-degree is ≈ a k and its average out-degree is ≈ ( k − 1 ) b . k For suitable a , b , k , w.h.p. • G ( n , k , a , b ) : all good partitions are correlated with the truth. • G ( n , d n ) : there are no good partitions. Proof: concentration + union bound. 11
Abbe/Sandon improved this for small d by taking the giant component and pruning trees. Detecting/distinguishing inefficiently Consider partitions of G into k equal parts. A partition is good if its average in-degree is ≈ a k and its average out-degree is ≈ ( k − 1 ) b . k For suitable a , b , k , w.h.p. • G ( n , k , a , b ) : all good partitions are correlated with the truth. • G ( n , d n ) : there are no good partitions. Proof: concentration + union bound. Distinguishing: check if there is a good partition. Detecting: find a good partition. 11
Recommend
More recommend