c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Proximity measures applied to community detection in complex networks Maximilien Danisch Thesis supervised by: Jean-Loup Guillaume and B´ en´ edicte Le Grand Complex networks team, LIP6-CNRS-UPMC CRI, Univ. Paris 1 December 15, 2015
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Complex networks networks nodes edges Facebook profiles friendship Definition Internet computers connections Network/Graph: a set of Web web pages hyperlinks nodes linked by edges. P2P peers file exchanges Remark • increasingly more networks • increasingly larger networks • have common properties ⇒ need for fast and generic algorithms to understand them and extract knowledge. Maximilien Danisch — Proximity measures and communities — December 15, 2015 2/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Why detecting groups of nodes? Organization Classification of documents (e.g. Wikipedia pages). Organization of friends’ lists on Facebook. Recommendation “People you may know” on Facebook or LinkedIn. “You may also like...” on Amazon. Prediction What is this unknown P2P file? What is the function of this unknown protein? Maximilien Danisch — Proximity measures and communities — December 15, 2015 3/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Group of nodes = Community Some definitions • a set of nodes that are similar, • a set of nodes that are close to one another, • a set of nodes highly connected inside, but poorly connected outside. Remark Different visions of communities: • overlapping / partition • global / local Maximilien Danisch — Proximity measures and communities — December 15, 2015 4/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Positioning Survey articles • Partition: Fortunato’s survey 2010 (500 references) ⇒ A lot of work has been done. • Overlapping communities: Xi et al. 2013 ⇒ More realistic than partition, but less work has been done and methods do not scale. • From local to global: Kanawati 2014 ⇒ if you solve locally, you may be able to solve globally. Position ⇒ Chose to work on overlapping communities with a local approach in order to • be realistic and • be able to treat large networks. Maximilien Danisch — Proximity measures and communities — December 15, 2015 5/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Communities detection: how? Observation Local community detection is done mainly using greedy heuristics to optimize an ad hoc quality function . Examples of quality function l i • rd ( S ) = l i + l o • C ( S ) = △ i ( S ) △ i ( S ) 3 ) × ( | S | △ i ( S )+ △ o ( S ) Friggeri et al. 2011 Problems: 1 design of the quality function is difficult, 2 optimization can be trapped in local minima. Maximilien Danisch — Proximity measures and communities — December 15, 2015 6/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Let’s not use quality functions! Idea A community can be defined as a set of nodes close to each other. ⇒ Let’s try to use proximity measures instead of quality functions to find communities . Maximilien Danisch — Proximity measures and communities — December 15, 2015 7/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Outline / Contributions 1 How to use proximity measures for community detection 2 Proposition of two proximity measures: • propagated opinion • Katz+ 3 Framework to: • find the communities of a given node (local) • find overlapping communities (global) • complete a set of nodes into a community Maximilien Danisch — Proximity measures and communities — December 15, 2015 8/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Networks and communities Maximilien Danisch — Proximity measures and communities — December 15, 2015 9/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Ego-centered community Maximilien Danisch — Proximity measures and communities — December 15, 2015 10/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Ego-centered community Maximilien Danisch — Proximity measures and communities — December 15, 2015 11/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 On a small visualizable network 0.025 0.020 PROXIMITY 0.015 0.010 0.005 0.000 0 50 100 150 200 250 300 RANK OF THE NODE Maximilien Danisch — Proximity measures and communities — December 15, 2015 12/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Wikipedia category = community MAGNUS CALSEN -1 PROXIMITY TO THE NODE 10 -2 10 -3 10 -4 10 -5 10 0 1 2 3 4 5 6 10 10 10 10 10 10 10 NUMBER OF TOPK NODES IN THE CATEGORY CHESS 4000 3500 3000 2500 2000 1500 1000 500 0 0 1 2 3 4 5 6 10 10 10 10 10 10 10 RANK OF THE NODE Maximilien Danisch — Proximity measures and communities — December 15, 2015 13/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Ego-centered community Maximilien Danisch — Proximity measures and communities — December 15, 2015 14/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Ego-centered community Maximilien Danisch — Proximity measures and communities — December 15, 2015 15/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 On a small visualizable network 1.0 0.8 PROXIMITY 0.6 0.4 0.2 0.0 0 50 100 150 200 250 300 350 400 RANK OF THE NODE Maximilien Danisch — Proximity measures and communities — December 15, 2015 16/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Several behaviors in Wikipedia: 0 10 sharp transition smooth transition -1 10 deformed power-law perfect power-law -2 10 PROXIMITY -3 10 -4 10 -5 10 -6 10 0 1 2 3 4 5 6 7 10 10 10 10 10 10 10 10 RANK OF THE NODE Maximilien Danisch — Proximity measures and communities — December 15, 2015 17/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Ego-centered community Maximilien Danisch — Proximity measures and communities — December 15, 2015 18/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Ego-centered community Maximilien Danisch — Proximity measures and communities — December 15, 2015 19/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Ego-centered community Maximilien Danisch — Proximity measures and communities — December 15, 2015 20/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Multi-ego-centered community for each node: minimum of the two proximities Maximilien Danisch — Proximity measures and communities — December 15, 2015 21/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Multi-egocentered communities Maximilien Danisch — Proximity measures and communities — December 15, 2015 22/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Bi-ego-centered communities In Wikipedia: Folk wrestling + Torii school = 0 10 Torii school Folk wrestling -1 10 -2 10 PROXIMITY -3 10 -4 10 -5 10 -6 10 0 1 2 3 4 5 6 7 10 10 10 10 10 10 10 10 RANK OF THE NODE Maximilien Danisch — Proximity measures and communities — December 15, 2015 23/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Bi-ego-centered communities In Wikipedia: Folk wrestling + Torii school = Sumo 0 0 10 10 Torii school Sumo Folk wrestling Minimum -1 -1 10 10 Rescaled minimum -2 -2 10 10 PROXIMITY PROXIMITY -3 -3 10 10 -4 -4 10 10 -5 -5 10 10 -6 -6 10 10 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 RANK OF THE NODE RANK OF THE NODE Maximilien Danisch — Proximity measures and communities — December 15, 2015 23/56
c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6 Which proximity measure? Classical proximity measures • distance: number of hops between two nodes • number of common neighbors between two nodes • Katz index • commuting time • hitting time • rooted page-rank Remark • need to have a discriminative measure • need to compute the proximity of all nodes to the node of interest in a fast way • with or without parameters? Maximilien Danisch — Proximity measures and communities — December 15, 2015 24/56
Recommend
More recommend