Lecture 17 Jan-Willem van de Meent Community Detection Problem: Can - PowerPoint PPT Presentation

Unsupervised Machine Learning   and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 17 Jan-Willem van de Meent

Community Detection Problem: Can we identify groups of densely connected nodes? (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Communities: Football Conferences Nodes: Football Teams, Edges: Matches, Communities: Conferences (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Communities: Academic Citations Source: Citation networks and Maps of science [Börner et al., 2012] Nodes: Journals, Edges: Citations, Communities: Academic Disciplines (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Communities: Protein-Protein Interactions Nodes: Proteins, Edges: Physical interactions, Communities: Functional Modules (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Community Detection Graph Partitioning Overlapping Communities We will work with undirected (unweighted) networks (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Centrality Measures HIGHEST 6 7 17 BETWEENNESS 16 15 CENTRALITY 8 9 1 2 3 4 5 14 10 11 HIGHEST 13 HI HEST DEGREE G S G 12 CLOSENESS CENTRALITY CENTRALITY (a) centrality illustration • Betweenness : Number of shortest paths • Closeness : Average distance to other nodes • Degree : Number of connections to other nodes

Betweenness Edge Strength (call volume) Edge Betweenness • Betweenness : Number of shortest paths   passing through a node or edge (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Edge Betweenness 5 12 4.5 A B D E 1 5 4 4.5 1.5 C G F 1.5 • Count number of shortest paths   passing through each edge   ( can be done with weighted edges ) • If there are multiple paths of equal   length, then split counts

Girvan-Newman Algorithm (hierarchical divisive clustering according to betweenness) 12 1 33 49 Repeat until k clusters found 1. Calculate betweenness 2. Remove edge(s) with highest betweenness (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Girvan-Newman Algorithm (hierarchical divisive clustering according to betweenness) Step Step Hierarchical network Step (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Girvan-Newman: Physics Citations (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Girvan-Newman Two problems 1. How can we compute the   betweenness for all edges? 2. How can we choose the   number of components k? (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Calculating Betweenness How can we count all shortest paths? • Loop over nodes in graph • Perform breadth-first search to find   shortest paths to other nodes • Increment counts for edges traversed   by shorts paths • Divide final betweenness by 2   ( since all paths counted twice )

Counting Shortest Paths 1 E E 4.5 1.5 1 1 4.5 D F 1.5 D F 3 0.5 0.5 1 3 2 B G B G 1 1 1 1 1 1 A C 1 A C Count number of Accumulate credit   shortest paths from   upwards, dividing   (E) to each node across shortest paths (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Counting Paths: Larger Example Original Graph Breadth-first Ordering from A (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Counting Paths: Larger Example Step 1. Count number of shortest paths from to each node (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Counting Paths: Larger Example 1 path to K. Split in ratio 3:3 Step 2. Propagate credit upwards, splitting   according to number of paths to parents (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Counting Paths: Larger Example 1+0.5 paths to J Split 1:2 1 path to K. Split in ratio 3:3 Step 2. Propagate credit upwards, splitting   according to number of paths to parents (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Determining the Number of Communities Hierarchical decomposition Choosing a cut-off Analogous problem to deciding on number   of clusters in hierarchical clustering (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Modularity Idea: Compare fraction of edges within module to fraction   that would be observed for random connections Adjacency Matrix Node Degree Node Assignment (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Modularity Use modularity to optimize connectivity within modules (Adapted from: Mining of Massive Datasets, http://www.mmds.org)

Lecture 17 Jan-Willem van de Meent Community Detection Problem: Can - PowerPoint PPT Presentation

Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 17 Jan-Willem van de Meent Community Detection Problem: Can we identify groups of densely connected nodes? (Adapted from: Mining of Massive Datasets,

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

FFA C licence course summary Victor Jauregui vic.jauregui@gmail.com Victor Jauregui FFA C

2. Introduction, part two Optimization hierarchy Available solvers in JuMP Writing

"East Fife, four... Forfar, five: Intonation of the Classified Football Results George

The Latest in Sports Concussion Management UCSF Primary Care Sports Medicine Conference Carlin

Virtual Update Meeting February 20, 2018 1:30 2:00 pm The presentation will begin shortly.

Ca Canada S Soccer Clu r Club L Lic icensin ing Pr Program Prin Princip iple les in in

Strategies for Loose Coupling in Large Java Desktop Applications Geertjan Wielenga

DELIVERING OUTCOMES RICK PINA Chief Technology Advisor, Public Sector Commanders Risk

Lecture 17 Jan-Willem van de Meent Community Detection Problem: Can - PowerPoint PPT Presentation

Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 17 Jan-Willem van de Meent Community Detection Problem: Can we identify groups of densely connected nodes? (Adapted from: Mining of Massive Datasets,

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

FFA C licence course summary Victor Jauregui vic.jauregui@gmail.com Victor Jauregui FFA C

2. Introduction, part two Optimization hierarchy Available solvers in JuMP Writing

&quot;East Fife, four... Forfar, five: Intonation of the Classified Football Results George

The Latest in Sports Concussion Management UCSF Primary Care Sports Medicine Conference Carlin

Virtual Update Meeting February 20, 2018 1:30 2:00 pm The presentation will begin shortly.

Ca Canada S Soccer Clu r Club L Lic icensin ing Pr Program Prin Princip iple les in in

Strategies for Loose Coupling in Large Java Desktop Applications Geertjan Wielenga

DELIVERING OUTCOMES RICK PINA Chief Technology Advisor, Public Sector Commanders Risk

"East Fife, four... Forfar, five: Intonation of the Classified Football Results George