Community Detection on an Euclidean Random Graph Abishek Sankararaman, Emmanuel Abbe and François Baccelli Jan 2020
Community Detection - Abstract Definition • Grouping objects given indirect information of memberships. A population partitioned into groups
Community Detection - Examples • Grouping objects given indirect information of memberships. A population partitioned into groups 1. People on an Online Social Network. 2. Proteins classified into groups based on their functional behavior. 3. Grouping Base-Stations based on similarities in traffic pattern.
Graph as Information Important sub-class Population - Represented as nodes of a graph. Membership Information - Encoded as labeled edges of the graph. Graph Clustering Problem - Given an unlabeled graph data, recover the partition of nodes.
Graph Clustering Graph Clustering - Given an unlabeled graph data, recover the partition of nodes. What if there are additional contextual information on each node ? Web-pages, the textual content in a page. Social Networks - Personal information (age, location, income….) Computational Biology - Metadata generated by measurements.
Planted Partition Random Connection Model .
Planted Partition Random Connection Model { 1 , 2 , · · · , N n } Vertex Set - N n - # nodes Each node has two labels - i ∈ [1 , N n ] X i ∈ R d location label and a community label Z i ∈ { − 1 , 1 } .
Planted Partition Random Connection Model { 1 , 2 , · · · , N n } Vertex Set - N n - # nodes Each node has two labels - i ∈ [1 , N n ] X i ∈ R d location label and a community label Z i ∈ { − 1 , 1 } Random Graph Parameters . Intensity. λ > 0 d ≥ 2 Dimension of embedding. f in ( · ) , f out ( · ) : R + → [0 , 1] s.t ∀ r ≥ 0 , f in ( r ) ≥ f out ( r ) f in ( r ) 1 f out ( r ) 0 r
Planted Partition Random Connection Model .
Planted Partition Random Connection Model 1) number of nodes N n ∼ Poisson( λ n ) On avg points per unit area. λ
Planted Partition Random Connection Model 1) number of nodes N n ∼ Poisson( λ n ) On avg points per unit area. λ i ∈ [1 , N n ] 2) Each node , has a − n 1 /d , n 1 /d � - Location label X i ∈ 2 2 sampled independently and uniformly
Planted Partition Random Connection Model 1) number of nodes N n ∼ Poisson( λ n ) On avg points per unit area. λ √ n i ∈ [1 , N n ] 2) Each node , has a − n 1 /d , n 1 /d � - Location label X i ∈ 2 2 √ n sampled independently and uniformly
Planted Partition Random Connection Model 1) number of nodes N n ∼ Poisson( λ n ) On avg points per unit area. λ √ n i ∈ [1 , N n ] 2) Each node , has a − n 1 /d , n 1 /d � - Location label X i ∈ 2 2 - Community label Z i ∈ { − 1 , +1 } √ n sampled independently and uniformly
<latexit sha1_base64="rQPECilXaowhjHZSbpkLdVA031s=">ACInicbVDLSgMxFM3UV62vUZdugkWoIGVGBHVXdOygn1AZxgyaYNzSRjkhHK0G9x46+4caGoK8GPMW0H60HAuey+54QJo0o7zodVWFhcWl4prpbW1jc2t+ztnaYSqcSkgQUTsh0iRjlpKGpZqSdSILikJFWOLgc91t3RCoq+I0eJsSPUY/TiGKkjRTY514kJGIMSuj1yC10jqA7ZVGQUT6qyMPvUqT6p3YCu+xUnQngPHFzUgY56oH95nUFTmPCNWZIqY7rJNrPkNQUMzIqeakiCcID1CMdQzmKifKzicURPDBKF5pbzeMaTtTfGxmKlRrGoZmMke6r2d5Y/K/XSXV05hunSaoJx9OPopRBLeA4L9ilkmDNhoYgLKm5FeI+kghrk2rJhODOWp4nzeOq61Td65Ny7SKPowj2wD6oABecghq4AnXQABjcg0fwDF6sB+vJerXep6MFK9/ZBX9gfX4BeJahw=</latexit> <latexit sha1_base64="rQPECilXaowhjHZSbpkLdVA031s=">ACInicbVDLSgMxFM3UV62vUZdugkWoIGVGBHVXdOygn1AZxgyaYNzSRjkhHK0G9x46+4caGoK8GPMW0H60HAuey+54QJo0o7zodVWFhcWl4prpbW1jc2t+ztnaYSqcSkgQUTsh0iRjlpKGpZqSdSILikJFWOLgc91t3RCoq+I0eJsSPUY/TiGKkjRTY514kJGIMSuj1yC10jqA7ZVGQUT6qyMPvUqT6p3YCu+xUnQngPHFzUgY56oH95nUFTmPCNWZIqY7rJNrPkNQUMzIqeakiCcID1CMdQzmKifKzicURPDBKF5pbzeMaTtTfGxmKlRrGoZmMke6r2d5Y/K/XSXV05hunSaoJx9OPopRBLeA4L9ilkmDNhoYgLKm5FeI+kghrk2rJhODOWp4nzeOq61Td65Ny7SKPowj2wD6oABecghq4AnXQABjcg0fwDF6sB+vJerXep6MFK9/ZBX9gfX4BeJahw=</latexit> <latexit sha1_base64="rQPECilXaowhjHZSbpkLdVA031s=">ACInicbVDLSgMxFM3UV62vUZdugkWoIGVGBHVXdOygn1AZxgyaYNzSRjkhHK0G9x46+4caGoK8GPMW0H60HAuey+54QJo0o7zodVWFhcWl4prpbW1jc2t+ztnaYSqcSkgQUTsh0iRjlpKGpZqSdSILikJFWOLgc91t3RCoq+I0eJsSPUY/TiGKkjRTY514kJGIMSuj1yC10jqA7ZVGQUT6qyMPvUqT6p3YCu+xUnQngPHFzUgY56oH95nUFTmPCNWZIqY7rJNrPkNQUMzIqeakiCcID1CMdQzmKifKzicURPDBKF5pbzeMaTtTfGxmKlRrGoZmMke6r2d5Y/K/XSXV05hunSaoJx9OPopRBLeA4L9ilkmDNhoYgLKm5FeI+kghrk2rJhODOWp4nzeOq61Td65Ny7SKPowj2wD6oABecghq4AnXQABjcg0fwDF6sB+vJerXep6MFK9/ZBX9gfX4BeJahw=</latexit> <latexit sha1_base64="rQPECilXaowhjHZSbpkLdVA031s=">ACInicbVDLSgMxFM3UV62vUZdugkWoIGVGBHVXdOygn1AZxgyaYNzSRjkhHK0G9x46+4caGoK8GPMW0H60HAuey+54QJo0o7zodVWFhcWl4prpbW1jc2t+ztnaYSqcSkgQUTsh0iRjlpKGpZqSdSILikJFWOLgc91t3RCoq+I0eJsSPUY/TiGKkjRTY514kJGIMSuj1yC10jqA7ZVGQUT6qyMPvUqT6p3YCu+xUnQngPHFzUgY56oH95nUFTmPCNWZIqY7rJNrPkNQUMzIqeakiCcID1CMdQzmKifKzicURPDBKF5pbzeMaTtTfGxmKlRrGoZmMke6r2d5Y/K/XSXV05hunSaoJx9OPopRBLeA4L9ilkmDNhoYgLKm5FeI+kghrk2rJhODOWp4nzeOq61Td65Ny7SKPowj2wD6oABecghq4AnXQABjcg0fwDF6sB+vJerXep6MFK9/ZBX9gfX4BeJahw=</latexit> Planted Partition Random Connection Model 1) number of nodes N n ∼ Poisson( λ n ) On avg points per unit area. λ √ n i ∈ [1 , N n ] 2) Each node , has a − n 1 /d , n 1 /d � - Location label X i ∈ 2 2 - Community label Z i ∈ { − 1 , +1 } √ n sampled independently and uniformly 3) Edge between with probability either i, j ∈ [1 , N n ] f in ( || X i − X j || ) - If ( same colors ) Z i = Z j ∀ r ≥ 0 , 1 ≥ f in ( r ) ≥ f out ( r ) ≥ 0 More edges within f out ( || X i − X j || ) - If ( different colors ) Z i 6 = Z j communities than across. Conditional on node labels, edges are independent
Planted Partition Random Connection Model R d 1) - a Poisson Point Process on with intensity { X i } i ∈ N λ { Z i } i ∈ N 2) Independently mark it each of which is uniform over { − 1 , 1 } 3) Connect any two nodes with probability i 6 = j 2 N f in ( || X i − X j || ) 1 Z i = Z j + f out ( || X i − X j || ) 1 Z i 6 = Z j independently for all pairs � d − n 1 /d , n 1 /d d √ n = G restricted to G n 2 2 √ n
Planted Partition Random Connection Model Model Parameters Intensity λ > 0 d ≥ 2 Dimension of embedding . f in ( · ) , f out ( · ) : R + → [0 , 1] s.t ∀ r ≥ 0 , f in ( r ) ≥ f out ( r ) f in ( r ) 1 f out ( r ) 0 r
Planted Partition Random Connection Model Z Z x ∈ R d f out ( || x || ) dx ≤ x ∈ R d f in ( || x || ) dx < ∞ Assume Avg # of neighbors in Z x ∈ R d f in ( || x || ) dx − o (1) - same community is - ( λ / 2) Z - opposite community is - ( λ / 2) x ∈ R d f out ( || x || ) dx − o (1) Constant avg degree √ n √ n
Community Detection Problem { Z i } i ∈ [1 ,N n ] { X i } i ∈ [0 ,N n ] Given and , estimate G n √ n { τ i } i ∈ [0 ,N n ] - Community estimates √ n
Recommend
More recommend