the ground truth about metadata and community detection in
play

The ground truth about metadata and community detection in 8 8 7 - PowerPoint PPT Presentation

The ground truth about metadata and community detection in 8 8 7 7 8 8 networks 5 5 0 0 . . 8 8 0 0 6 6 1 1 : : Leto Peel v v i i X X Universit catholique de Louvain r r a a Community detectjon: Split nodes


  1. The ground truth about metadata and community detection in 8 8 7 7 8 8 networks 5 5 0 0 . . 8 8 0 0 6 6 1 1 : : Leto Peel v v i i X X Université catholique de Louvain r r a a

  2. Community detectjon: Split nodes into groups based 8 on their patuern of links 7 8 5 0 . 8 0 6 1 : v i X r a

  3. Data generatjng process: 8 7 Generate nodes and assign to 8 communitjes 5 0 . 8 0 6 1 : v i X r a

  4. Data generatjng process: 8 7 Generate nodes and assign to 8 communitjes, T 5 0 . 8 0 g( T ) 6 1 : v i Generate links in G dependent X r on community membership a

  5. Community detectjon: 8 7 8 Infer T 5 0 . 8 0 f( G ) 6 1 : v i Observe G X r a Assess performance on how well we recover T

  6. Ground truth in real networks? 8 7 8 5 0 . 8 ? 0 6 1 : v i X r a

  7. Networks can have metadata that describe the nodes 8 7 8 5 social networks age, sex, ethnicity, race, etc. 0 . food webs feeding mode, species body mass, etc. 8 0 internet data capacity, physical locatjon, etc. 6 1 : protein interactjons molecular weight, associatjon with cancer, etc. v i X r a

  8. Recovering metadata implies sensible methods 8 7 8 5 0 . 8 0 6 1 : v i X stochastjc block model stochastjc block model r a with degree correctjon Karrer, Newman. Stochastjc blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011). Adamic, Glance. The politjcal blogosphere and the 2004 US electjon: divided they blog. 36–43 (2005).

  9. Metadata ofuen treated as ground truth 8 7 8 5 0 . 8 0 6 1 : v i X r a Yang & Leskovec. Overlapping community detectjon at scale: a nonnegatjve matrix factorizatjon approach (2013).

  10. Metadata ofuen treated as ground truth 8 7 8 5 0 . 8 0 6 1 : v i Do you think thats ground X r truth you're detectjng? a Yang & Leskovec. Overlapping community detectjon at scale: a nonnegatjve matrix factorizatjon approach (2013).

  11. 8 7 8 Ground truth, T Ground truth, T 5 0 . 8 0 6 1 : d d ( ( T T , , f f ( ( G G ) ) ) ) v i X r a Communities, C = f ( G )

  12. Metadata, M d ( M, T ) 8 7 8 Ground truth, T Ground truth, T 5 0 d ( M, f ( G ) ) . 8 0 6 1 : d d ( ( T T , , f f ( ( G G ) ) ) ) v i X r a Communities, C Communities, C = f ( G ) = f ( G )

  13. When communitjes ≠ metadata... 8 7 8 5 0 . 8 0 6 1 : v i X r a (i) the metadata do not relate to the network structure,

  14. When communitjes ≠ metadata... 8 7 8 5 0 . 8 0 6 1 : v i X r a (ii) the detected communitjes and the metadata capture difgerent aspects of the network’s structure,

  15. When communitjes ≠ metadata... 8 7 8 5 0 . 8 0 6 1 : v i X r a (iii) the network contains no structure (e.g., an E-R random graph)

  16. When communitjes ≠ metadata... 8 7 8 5 0 . 8 0 6 1 : v i X r a (iv) the community detectjon algorithm does not perform well. Typically we assume this is the only possible cause

  17. The Karate Club network Instructor President 8 7 8 5 0 . 8 0 6 1 : v i X r a Split into factjons

  18. The Karate Club network Instructor President 8 7 8 5 0 . 8 0 6 1 : v i X r a Split into factjons

  19. ‘This can be explained by notjng that he was only three weeks away from a test for black belt (master status) when the split in the club 8 7 occurred. Had he joined the offjcers’[President's] 8 5 club he would have had to give up his rank and 0 . 8 begin again in a new style of karate with a white 0 6 (beginner’s) belt, since the offjcers had decided 1 : v i to change the style of karate practjced in their X r new club’ a - Zachary 1977

  20. You only see what you look for... 8 7 8 5 0 . 8 0 6 1 : v i X r a US politjcs is more than two opposing views Adamic, Glance. The politjcal blogosphere and the 2004 US electjon: divided they blog. 36–43 (2005). Peixoto, T. P. Hierarchical Block Structures and High-Resolutjon Model Selectjon in Large Networks. Phys. Rev. X 4, 011047 (2014).

  21. Difgerent generatjve processes = difgerent community structures 8 7 8 5 0 . 8 0 6 1 : v i X r a

  22. Many good partjtjons... 8 7 8 5 0 . 8 0 6 1 : v i X r a Evans, T. S. Clique graphs and overlapping communitjes. J. Stat. Mech. 2010, P12037–22 (2010).

  23. Metadata are not ground truth for community detectjon 8 7 8 5 0 . 8 0 6 1 : v i X r a

  24. Metadata are not ground truth for community detectjon No interpretability of negatjve results. (i) M unrelated to network structure (ii) C and M capture difgerent aspects of network structure 8 7 (iii) the network has no structure 8 (iv) the algorithm does not perform well 5 0 . 8 0 6 1 : v i X r a

  25. Metadata are not ground truth for community detectjon No interpretability of negatjve results. (i) M unrelated to network structure (ii) C and M capture difgerent aspects of network structure 8 7 (iii) the network has no structure 8 (iv) the algorithm does not perform well 5 0 . Multjple sets of metadata exist. 8 0 Which set is ground truth? 6 1 : v i X r a

  26. Metadata are not ground truth for community detectjon No interpretability of negatjve results. (i) M unrelated to network structure (ii) C and M capture difgerent aspects of network structure 8 7 (iii) the network has no structure 8 (iv) the algorithm does not perform well 5 0 . Multjple sets of metadata exist. 8 0 Which set is ground truth? 6 1 : We see what we look for. v i Confjrmatjon bias. Publicatjon bias. X r a

  27. Metadata are not ground truth for community detectjon No interpretability of negatjve results. (i) M unrelated to network structure (ii) C and M capture difgerent aspects of network structure 8 7 (iii) the network has no structure 8 (iv) the algorithm does not perform well 5 0 . Multjple sets of metadata exist. 8 0 Which set is ground truth? 6 1 : We see what we look for. v i Confjrmatjon bias. Publicatjon bias. X r a “Community” is model dependent. Do we expect all networks across all domains to have the same relatjonship with communitjes?

  28. Community detectjon is an inverse problem 8 7 Communitjes, T 8 5 0 . data community 8 g( T ) f( G ) generatjon detectjon 0 6 1 : v i Network, G X r a

  29. However, in real networks both T and g are unknown 8 7 8 5 For any graph there exist a (Bell) number of possible “ground truth” partjtjons, 0 and an infjnite number of capable generatjve models. . 8 0 6 1 {generatjve models, g} x {partjtjons, T} {graph G} : v i many to one X r a f o o r p r o f e r e h e e s The community detectjon problem is ill-posed (no unique solutjon)

  30. A No Free Lunch Theorem for community detectjon? NFL theorem (supervised learning) states that there cannot exist a classifjer that is a priori betuer than any other, averaged 8 over all possible problems. 7 8 5 0 . 8 0 6 1 : v i X r a Wolpert, D. H. The lack of a priori distjnctjons between learning algorithms. Neural Computatjon 8, 1341–1390 (1996).

  31. A No Free Lunch Theorem for community detectjon NFL Theorem for communtjy detectjon 8 (paraphrased): 7 8 5 For the community detectjon problem, with accuracy 0 . measured by adjusted mutual informatjon, the uniform 8 average of the accuracy of any method f over all 0 6 possible community detectjon problems is a constant 1 : which is independent of f . v i X r f o a o r p r o f e r e h e e s On average, no community detectjon algorithm performs betuer than any other

  32. a r X i v : 1 6 0 8 . 0 5 8 7 8

  33. So, what about metadata? 8 7 8 Metadata = types of nodes 5 0 . Communitjes = how nodes interact 8 0 6 1 : Metadata + Communitjes = how difgerent types of nodes interact with each other v i X r a we require new methods to understand the relatjonship between metadata and structure

  34. Are the metadata related to the network structure? Blockmodel Entropy Signifjcance Test 8 7 8 5 0 . 8 0 6 1 Do metadata and detected communitjes capture : v difgerent aspects network structure? i X r neoSBM a

  35. Are the metadata related to the network structure? Blockmodel Entropy Signifjcance Test 8 7 8 5 (i) the metadata do not relate to the network structure, 0 . 8 0 6 1 Do metadata and detected communitjes capture : v difgerent aspects network structure? i X r neoSBM a (ii) communitjes and metadata capture difgerent aspects network structure,

  36. The Stochastjc Blockmodel 8 Edges are conditjonally independent given community membership 7 p ij = p(e ij |z i ,z j ,ω) = ω zi,zj 8 5 0 . 8 inter-community 0 density 6 i n t 1 r a : - c o v m i m X u n r increasing i t y a d density e n s i inter-community t y density

Recommend


More recommend