Hyperbolic Communities: Modelling Communities Beyond Cliques Pauli - PowerPoint PPT Presentation

Hyperbolic Communities: Modelling Communities Beyond Cliques Pauli Miettinen   8 June 2017

Joint work with Stefan Neumann Saskia Metzler   Sanjar Karaev Uni Wien MPI-INF MPI-INF Your   picture   here? Stephan Günnemann Rainer Gemulla T.U. München Uni Mannheim 8 June 2017 Pauli Miettinen 2

Communities = cliques • Communities are often modelled as (quasi-) 200 cliques 150 • Dense subgraphs • All edges equally likely 100 • In a community, 50 everybody knows (or should know) everybody 0 else 0 50 100 150 200 8 June 2017 Pauli Miettinen 3

Communities ≠ cliques • But many communities are not cliques 200 • There is more structure 150 • Some people know more people 100 • Others know just the central people   50 ⇒ Cliques are not a good 0 model 0 50 100 150 200 8 June 2017 Pauli Miettinen 4

Communities really   are not cliques ≈ 160 stackexchange communities Nodes are users, edges are comments or answers during the last year 8 June 2017 Pauli Miettinen 5

Why should I care? • Better understanding of the community structures • Better fit to real-world data • Better prediction power • More realistic random graphs • … 8 June 2017 Pauli Miettinen 6

The core/periphery model • Classical model in social Core Periphery sciences (from 1999) • Communities are L-shaped • A core that is a clique • A periphery that is only connected to the core • In real-world, 0s can appear both in core and in Borgatti & Everett, 1999 periphery 8 June 2017 8 June 2017 Pauli Miettinen 7

Core/periphery example Borgatti & Everett, 1999 8 June 2017 Pauli Miettinen 8

Nested matrices • Matrix is nested if its rows and columns can be ordered so that • all 1s are consecutive • no row has more 1s than the row above it • Important concept in ecology • Core/periphery matrices are nested 8 June 2017 Pauli Miettinen 9

HyCom communities • Assume ordered (by degree) 50 • In HyCom communities , 40 α = -0.5, τ = 0.1, 380 edges an edge ( i , j ) is in the 30 community i ff   i α j α > τ   20 for some α ≤ 0   10 and 0 < τ < 1 0 • N.B. same as ĳ < τ ’ with 0 10 20 30 40 50 τ ’ = τ 1/ α Araujo et al. ECMLPKDD ’14 8 June 2017 Pauli Miettinen 10

Some comments   on the models • Core/periphery model seems too restricted • T ails taper towards the end • Nested model is very general • Perhaps too general… • HyCom seems like a good compromise • But with only one free variable, they’re too quite limited 8 June 2017 Pauli Miettinen 11

The hyperbolic model • Let us assume the nodes 100 are ordered and i =0, 1, … 75 • In hyperbolic model edge ( i , j ) is in the community if   50 ( i + p )( j + p ) ≤ θ (– p , – p ) 25 • (– p , – p ) is the centre of 50 -25 0 25 50 75 100 the hyperbola • θ places the curve in the -25 gradient Metzler et al. ICDM 2016 8 June 2017 Pauli Miettinen 12

The core/tail model • The core/tail model is 125 parameterized by the size 100 of the core, γ , and the 75 thickness of the tail, H • T 50 ail cannot be thicker than the core 25 H γ • In fact   25 0 25 50 75 100 125 γ ≤ ( n – 1 + H )/2 8 June 2017 Pauli Miettinen 13

The mixture model • HyCom: i ∙ j ≤ τ 100 • Line: i + j ≤ σ • Mixture:   80 (1 – x )( i ∙ j ) + x ( i + j ) ≤ Σ   60 for 0 ≤ x ≤ 1 and Σ ∈ ℝ odel: 40 • Actually   (1 – | x |)( ĳ ) + x ( i + j ) ≤ Σ   20 for –1 ≤ x ≤ 1 0 0 20 40 60 80 100 • Slightly more general 8 June 2017 Pauli Miettinen 14

All the models   are the same • Equivalence Theorem: Given a valid pair of parameters for one of the three models above, there exists valid pairs of parameters for the other two models that will model exactly the same graph • Between hyperbola and core/tail, this is straight- forward (re-parametrization) • Requires a proof between mixture and hyperbola 8 June 2017 Pauli Miettinen 15

Hyperbolic vs.   cliques and core/tail • Cliques are a special case of the models • E.g. set core size to the community size • Core/periphery is also a special case of the model • Or a limit case (needs to be checked) 8 June 2017 Pauli Miettinen 16

Hyperbolic vs.   power-law • HyCom is a special case of 100 100 our models 90 80 80 • Recall mixture:   70 (1 – x )( i ∙ j ) + x ( i + j ) ≤ Σ 60 60 50 • T echnically HyCom is   40 40 1/2( i ∙ j ) + 1/2( i + j ) ≤ Σ 30 20 20 10 0 0 0 20 40 60 80 100 0 20 40 60 80 100 8 June 2017 Pauli Miettinen 17

Some example communities Examples 8 June 2017 Pauli Miettinen 18

On likelihoods Spars • Area under the curve should be dense • Area above the curve should be 200 sparse • Minimize the log-likelihood   150 |E|log(d) + | ∁ E|log(1–d)   +|O|log(s) + | ∁ O|log(s) 100 • |E| = edges in comm, | ∁ E| = non- edges in comm., d = density of 50 comm., |O| = edges outside of comm., | ∁ O| = non-edges outside 0 of comm., s = density outside 0 50 100 150 200 comm. Dense 8 June 2017 Pauli Miettinen 19

More than one community • Generalizing to many communities is mostly straight forward • Every community has its own parameters • Likelihoods inside communities sum up • The area outside all communities should be sparse • But overlapping communities add complexity… 8 June 2017 Pauli Miettinen 20

Overlapping communities! les But now this is a bad community! And now it’s just a big mess! This is also a good community This is a good community 8 June 2017 Pauli Miettinen 21

Forms of overlap No overlap Node overlap Edge overlap Just find the   The overlapping   The overlapping   communities and   edges must be   edges must be   model them ignored handled Assign every edge to at most 1 community 8 June 2017 Pauli Miettinen 22

Are these good models? Ground-truth   Calculated communities communities LL ratio LL ratio block model HyCom spectral clustering BMF HyCom Amazon 26450.6 30997.1 Email 10895.8 3552.0 250.1 DBLP (100) 3148.5 -788.0 Erd˝ os 1797.0 949.0 256.3 DBLP -264974.7 17958.1 Friendster 200627.6 17811.7 Jazz 3003.8 4435.0 3718.5 LiveJournal 154982.4 22705.8 PolBooks 648.0 303.3 228.2 Orkut 11945.3 1598.5 YouTube 75689.6 12660.0 Likelihood ratio test   the larger the ratio, the more likely our model is better 8 June 2017 Pauli Miettinen 23

Distributions of parameters 35 30 25 1 20 15 0.9 10 0.8 5 0 . relative to community size 0 10 20 30 0.7 How clique-like the 0.6 0.5 communities are? 0.4 0.3 300 0.2 250 200 0.1 150 100 0 50 Amazon DBLP(100) DBLP Friendster LiveJournal Orkut YouTube 0 0 100 200 300 8 June 2017 Pauli Miettinen 24

Distribution of parameters 1 1 0.9 0.8 0.8 0.6 H relative to community size 0.7 0.4 0.6 0.2 0.5 0 x 0.4 -0.2 0.3 -0.4 0.2 -0.6 0.1 -0.8 0 -1 Amazon DBLP(100) DBLP Friendster LiveJournal Orkut YouTube Amazon DBLP(100) DBLP Friendster LiveJournal Orkut YouTube How thick the tail is? How similar to the HyCom the models are? 8 June 2017 Pauli Miettinen 25

How to find the communities? • We can use any clique-like community finding algorithm, and re-model • But the found communities might not be very good • We can try to grow the communities from cores • But overlapping communities cause issues 8 June 2017 Pauli Miettinen 26

Nested matrix redux • Work in progress: express the adjacency matrix as a union of nested submatrices • More general than our models but potentially easier • Interesting on its own right • We find the nested subgraphs by finding a suitable matrix factorization 8 June 2017 Pauli Miettinen 27

Nested matrices as rank-1 structures • Nested matrices can have a full rank • Their nonnegative rounding rank is 1 • Every nested matrix A can be expressed as A = thr( xy T ) • x and y are nonnegative vectors • thr( a ) = 1 if a ≥ 0.5 and 0 otherwise Neumann et al. ICDM 2016 8 June 2017 Pauli Miettinen 28

Example   1 1 1 1 1 0   1 0 0 0 1  0   1    4 2 1 1 1 1 4 4 2 1 A � � 2 1 1 / 2  = 1 1 0 2 1 1 / 2 1 / 4 2 1 1 / 2 thr 1 / 2 + � = @   @ A    1 1 / 2 1 / 4 1 0 0 1 1 1 / 2 1 / 4 8 June 2017 Pauli Miettinen 29

Combining nested matrices: problems • thr([ x 1 x 2 ][ y 1 y 2 ] T ) is not necessarily a union of two nested matrices • If x < θ and y < θ , it’s still possible that   x + y ≥ θ • Higher ranks work only for completely disjoint communities 8 June 2017 Pauli Miettinen 30

Tropical algebra   to the rescue • The subtropical algebra over the nonnegative reals has • the summation ⊞ defined as the maximum • the product ⊠ defined as usual • If x < θ and y < θ , x ⊞ y = max{ x, y } < θ • The thresholding distributes over ⊞ :   thr( x ⊞ y ) = thr( x ) ⊞ thr( y ) 8 June 2017 Pauli Miettinen 31

Hyperbolic Communities: Modelling Communities Beyond Cliques Pauli - PowerPoint PPT Presentation

Hyperbolic Communities: Modelling Communities Beyond Cliques Pauli Miettinen 8 June 2017 Joint work with Stefan Neumann Saskia Metzler Sanjar Karaev Uni Wien MPI-INF MPI-INF Your picture here? Stephan Gnnemann Rainer

Hyperbolic Neural Networks Hyperbolic Neural Networks Use hyperbolic space instead of Euclidean

Hyperbolic Geometry Victor Gonzalez Mentor: Ryan Kirk May 4, 2016 Hyperbolic Geometry We are

Acylindrically hyperbolic groups Denis Osin Vanderbilt University June 6, 2013 1 / 12 Some

Sutherland model Ian Marshall March 2015 Ian Marshall Ruijsenaars type deformation of hyperbolic

Spherical and hyperbolic 2-spheres with cone singularities Workshop Hyperbolic geometry and

Maximal arithmetic hyperbolic lattices with fixed invariant trace field Jiming Ma Fudan

Raytracing in hyperbolic 3-manifolds and link complements Matthias Goerner November 13th, 2019

JUST THE MATHS SLIDES NUMBER 4.1 HYPERBOLIC FUNCTIONS 1 (Definitions, graphs and

JUST THE MATHS SLIDES NUMBER 10.7 DIFFERENTIATION 7 (Inverse hyperbolic functions) by

Hyperbolic Color Codes on Densest Tessellations Clarice Dias de Albuquerque (UFCA) Reginaldo

Hyperbolic Conservation Laws with Memory Cleopatra Christoforou Northwestern University USA

Optimal decay estimates on the framework of Besov spaces for hyperbolic systems with degenerate

Hyperbolic Polynomials, Interlacers, and Sums of Squares Cynthia Vinzant University of Michigan

Beyond Beyond Journey Journey Times Times Bluetooth journey time process Moving beyond basic

The Modelling and Simulation Process 1. History of Modelling and Simulation 2. Modelling and

(Modelling) Semantics of Modelling Languages Hans Vangheluwe 7 September 2010, Lisboa, Portugal

Quantitative Text Analysis. Applications to Social Media Research Pablo Barber a London

Making Peripheral Participation Legitimate Aaron Halfaker halfaker@cs.umn.edu Oliver Keyes

Review Final exam Final exam will be 11-12 problems, drop any 2 Cumulative up to and including

System em on a a Programable C Chi hip p (SoPC) Cristian Sister erna Universidad Nacional

Phase Transition of the 2-Choices Dynamics on Core-Periphery Networks E. Cruciani, E. Natale, A.

the troubles of manufacturing Paul Krugman 8/12/08 Outline: 1. The original motivations of new

Self-Aligned InGaAs FinFETs with 5-nm Fin-Width and 5-nm Gate-Contact Separation Alon Vardi, Lisa

Thermal simulation of the Inner System local supports Rafael Coelho Lopes de Sa 09/04/2019

Sambuz

Useful Links

Newsletter

Mail Us