community detection on an euclidean random graph
play

Community Detection on an Euclidean Random Graph Abishek - PowerPoint PPT Presentation

Community Detection on an Euclidean Random Graph Abishek Sankararaman, Emmanuel Abbe and Franois Baccelli Jan 2020 Community Detection - Abstract Definition Grouping objects given indirect information of memberships. A population


  1. Community Detection on an Euclidean Random Graph Abishek Sankararaman, Emmanuel Abbe and François Baccelli Jan 2020

  2. Community Detection - Abstract Definition • Grouping objects given indirect information of memberships. A population partitioned into groups

  3. Community Detection - Examples • Grouping objects given indirect information of memberships. A population partitioned into groups 1. People on an Online Social Network. 2. Proteins classified into groups based on their functional behavior. 3. Grouping Base-Stations based on similarities in traffic pattern.

  4. Graph as Information Important sub-class Population - Represented as nodes of a graph. Membership Information - Encoded as labeled edges of the graph. Graph Clustering Problem - Given an unlabeled graph data, recover the partition of nodes.

  5. Graph Clustering Graph Clustering - Given an unlabeled graph data, recover the partition of nodes. What if there are additional contextual information on each node ? Web-pages, the textual content in a page. Social Networks - Personal information (age, location, income….) Computational Biology - Metadata generated by measurements.

  6. Planted Partition Random Connection Model .

  7. Planted Partition Random Connection Model { 1 , 2 , · · · , N n } Vertex Set - N n - # nodes Each node has two labels - i ∈ [1 , N n ] X i ∈ R d location label and a community label Z i ∈ { − 1 , 1 } .

  8. Planted Partition Random Connection Model { 1 , 2 , · · · , N n } Vertex Set - N n - # nodes Each node has two labels - i ∈ [1 , N n ] X i ∈ R d location label and a community label Z i ∈ { − 1 , 1 } Random Graph Parameters . Intensity. λ > 0 d ≥ 2 Dimension of embedding. f in ( · ) , f out ( · ) : R + → [0 , 1] s.t ∀ r ≥ 0 , f in ( r ) ≥ f out ( r ) f in ( r ) 1 f out ( r ) 0 r

  9. Planted Partition Random Connection Model .

  10. Planted Partition Random Connection Model 1) number of nodes N n ∼ Poisson( λ n ) On avg points per unit area. λ

  11. Planted Partition Random Connection Model 1) number of nodes N n ∼ Poisson( λ n ) On avg points per unit area. λ i ∈ [1 , N n ] 2) Each node , has a − n 1 /d , n 1 /d  � - Location label X i ∈ 2 2 sampled independently and uniformly

  12. Planted Partition Random Connection Model 1) number of nodes N n ∼ Poisson( λ n ) On avg points per unit area. λ √ n i ∈ [1 , N n ] 2) Each node , has a − n 1 /d , n 1 /d  � - Location label X i ∈ 2 2 √ n sampled independently and uniformly

  13. Planted Partition Random Connection Model 1) number of nodes N n ∼ Poisson( λ n ) On avg points per unit area. λ √ n i ∈ [1 , N n ] 2) Each node , has a − n 1 /d , n 1 /d  � - Location label X i ∈ 2 2 - Community label Z i ∈ { − 1 , +1 } √ n sampled independently and uniformly

  14. <latexit sha1_base64="rQPECilXaowhjHZSbpkLdVA031s=">ACInicbVDLSgMxFM3UV62vUZdugkWoIGVGBHVXdOygn1AZxgyaYNzSRjkhHK0G9x46+4caGoK8GPMW0H60HAuey+54QJo0o7zodVWFhcWl4prpbW1jc2t+ztnaYSqcSkgQUTsh0iRjlpKGpZqSdSILikJFWOLgc91t3RCoq+I0eJsSPUY/TiGKkjRTY514kJGIMSuj1yC10jqA7ZVGQUT6qyMPvUqT6p3YCu+xUnQngPHFzUgY56oH95nUFTmPCNWZIqY7rJNrPkNQUMzIqeakiCcID1CMdQzmKifKzicURPDBKF5pbzeMaTtTfGxmKlRrGoZmMke6r2d5Y/K/XSXV05hunSaoJx9OPopRBLeA4L9ilkmDNhoYgLKm5FeI+kghrk2rJhODOWp4nzeOq61Td65Ny7SKPowj2wD6oABecghq4AnXQABjcg0fwDF6sB+vJerXep6MFK9/ZBX9gfX4BeJahw=</latexit> <latexit sha1_base64="rQPECilXaowhjHZSbpkLdVA031s=">ACInicbVDLSgMxFM3UV62vUZdugkWoIGVGBHVXdOygn1AZxgyaYNzSRjkhHK0G9x46+4caGoK8GPMW0H60HAuey+54QJo0o7zodVWFhcWl4prpbW1jc2t+ztnaYSqcSkgQUTsh0iRjlpKGpZqSdSILikJFWOLgc91t3RCoq+I0eJsSPUY/TiGKkjRTY514kJGIMSuj1yC10jqA7ZVGQUT6qyMPvUqT6p3YCu+xUnQngPHFzUgY56oH95nUFTmPCNWZIqY7rJNrPkNQUMzIqeakiCcID1CMdQzmKifKzicURPDBKF5pbzeMaTtTfGxmKlRrGoZmMke6r2d5Y/K/XSXV05hunSaoJx9OPopRBLeA4L9ilkmDNhoYgLKm5FeI+kghrk2rJhODOWp4nzeOq61Td65Ny7SKPowj2wD6oABecghq4AnXQABjcg0fwDF6sB+vJerXep6MFK9/ZBX9gfX4BeJahw=</latexit> <latexit sha1_base64="rQPECilXaowhjHZSbpkLdVA031s=">ACInicbVDLSgMxFM3UV62vUZdugkWoIGVGBHVXdOygn1AZxgyaYNzSRjkhHK0G9x46+4caGoK8GPMW0H60HAuey+54QJo0o7zodVWFhcWl4prpbW1jc2t+ztnaYSqcSkgQUTsh0iRjlpKGpZqSdSILikJFWOLgc91t3RCoq+I0eJsSPUY/TiGKkjRTY514kJGIMSuj1yC10jqA7ZVGQUT6qyMPvUqT6p3YCu+xUnQngPHFzUgY56oH95nUFTmPCNWZIqY7rJNrPkNQUMzIqeakiCcID1CMdQzmKifKzicURPDBKF5pbzeMaTtTfGxmKlRrGoZmMke6r2d5Y/K/XSXV05hunSaoJx9OPopRBLeA4L9ilkmDNhoYgLKm5FeI+kghrk2rJhODOWp4nzeOq61Td65Ny7SKPowj2wD6oABecghq4AnXQABjcg0fwDF6sB+vJerXep6MFK9/ZBX9gfX4BeJahw=</latexit> <latexit sha1_base64="rQPECilXaowhjHZSbpkLdVA031s=">ACInicbVDLSgMxFM3UV62vUZdugkWoIGVGBHVXdOygn1AZxgyaYNzSRjkhHK0G9x46+4caGoK8GPMW0H60HAuey+54QJo0o7zodVWFhcWl4prpbW1jc2t+ztnaYSqcSkgQUTsh0iRjlpKGpZqSdSILikJFWOLgc91t3RCoq+I0eJsSPUY/TiGKkjRTY514kJGIMSuj1yC10jqA7ZVGQUT6qyMPvUqT6p3YCu+xUnQngPHFzUgY56oH95nUFTmPCNWZIqY7rJNrPkNQUMzIqeakiCcID1CMdQzmKifKzicURPDBKF5pbzeMaTtTfGxmKlRrGoZmMke6r2d5Y/K/XSXV05hunSaoJx9OPopRBLeA4L9ilkmDNhoYgLKm5FeI+kghrk2rJhODOWp4nzeOq61Td65Ny7SKPowj2wD6oABecghq4AnXQABjcg0fwDF6sB+vJerXep6MFK9/ZBX9gfX4BeJahw=</latexit> Planted Partition Random Connection Model 1) number of nodes N n ∼ Poisson( λ n ) On avg points per unit area. λ √ n i ∈ [1 , N n ] 2) Each node , has a − n 1 /d , n 1 /d  � - Location label X i ∈ 2 2 - Community label Z i ∈ { − 1 , +1 } √ n sampled independently and uniformly 3) Edge between with probability either i, j ∈ [1 , N n ] f in ( || X i − X j || ) - If ( same colors ) Z i = Z j ∀ r ≥ 0 , 1 ≥ f in ( r ) ≥ f out ( r ) ≥ 0 More edges within f out ( || X i − X j || ) - If ( different colors ) Z i 6 = Z j communities than across. Conditional on node labels, edges are independent

  15. Planted Partition Random Connection Model R d 1) - a Poisson Point Process on with intensity { X i } i ∈ N λ { Z i } i ∈ N 2) Independently mark it each of which is uniform over { − 1 , 1 } 3) Connect any two nodes with probability i 6 = j 2 N f in ( || X i − X j || ) 1 Z i = Z j + f out ( || X i − X j || ) 1 Z i 6 = Z j independently for all pairs � d − n 1 /d , n 1 /d d  √ n = G restricted to G n 2 2 √ n

  16. Planted Partition Random Connection Model Model Parameters Intensity λ > 0 d ≥ 2 Dimension of embedding . f in ( · ) , f out ( · ) : R + → [0 , 1] s.t ∀ r ≥ 0 , f in ( r ) ≥ f out ( r ) f in ( r ) 1 f out ( r ) 0 r

  17. Planted Partition Random Connection Model Z Z x ∈ R d f out ( || x || ) dx ≤ x ∈ R d f in ( || x || ) dx < ∞ Assume Avg # of neighbors in Z x ∈ R d f in ( || x || ) dx − o (1) - same community is - ( λ / 2) Z - opposite community is - ( λ / 2) x ∈ R d f out ( || x || ) dx − o (1) Constant avg degree √ n √ n

  18. Community Detection Problem { Z i } i ∈ [1 ,N n ] { X i } i ∈ [0 ,N n ] Given and , estimate G n √ n { τ i } i ∈ [0 ,N n ] - Community estimates √ n

Recommend


More recommend