Hyperbolic Communities: Modelling Communities Beyond Cliques
Pauli Miettinen 8 June 2017
Hyperbolic Communities: Modelling Communities Beyond Cliques Pauli - - PowerPoint PPT Presentation
Hyperbolic Communities: Modelling Communities Beyond Cliques Pauli Miettinen 8 June 2017 Joint work with Stefan Neumann Saskia Metzler Sanjar Karaev Uni Wien MPI-INF MPI-INF Your picture here? Stephan Gnnemann Rainer
Hyperbolic Communities: Modelling Communities Beyond Cliques
Pauli Miettinen 8 June 2017
Pauli Miettinen 8 June 2017
2
Saskia Metzler MPI-INF Stephan Günnemann T.U. München Stefan Neumann Uni Wien Sanjar Karaev MPI-INF Rainer Gemulla Uni Mannheim
Your picture here?
Pauli Miettinen 8 June 2017
Communities = cliques
modelled as (quasi-) cliques
everybody knows (or should know) everybody else
3
50 100 150 200 50 100 150 200
Pauli Miettinen 8 June 2017
Communities ≠ cliques
not cliques
people
central people ⇒ Cliques are not a good model
4
50 100 150 200 50 100 150 200
Pauli Miettinen 8 June 2017
Communities really are not cliques
5
≈ 160 stackexchange communities Nodes are users, edges are comments or answers during the last year
Pauli Miettinen 8 June 2017
structures
6
Pauli Miettinen 8 June 2017
The core/periphery model
sciences (from 1999)
connected to the core
appear both in core and in periphery
7 8 June 2017
Borgatti & Everett, 1999
Core Periphery
Pauli Miettinen 8 June 2017
Core/periphery example
8
Borgatti & Everett, 1999
Pauli Miettinen 8 June 2017
and columns can be
the row above it
ecology
nested
9
Pauli Miettinen 8 June 2017
degree)
an edge (i, j) is in the community iff iαjα > τ for some α ≤ 0 and 0 < τ < 1
τ’ = τ1/α
10
10 20 30 40 50 10 20 30 40 50 α = -0.5, τ = 0.1, 380 edges
Araujo et al. ECMLPKDD ’14
Pauli Miettinen 8 June 2017
Some comments
ails taper towards the end
quite limited
11
Pauli Miettinen 8 June 2017
are ordered and i=0, 1, …
(i, j) is in the community if (i + p)(j + p) ≤ θ
the hyperbola
gradient
12
50(–p, –p)
Metzler et al. ICDM 2016
Pauli Miettinen 8 June 2017
parameterized by the size
thickness of the tail, H
ail cannot be thicker than the core
γ ≤ (n – 1 + H)/2
13
25 25 50 75 100 125 25 50 75 100 125γ H
Pauli Miettinen 8 June 2017
(1 – x)(i∙j) + x(i + j) ≤ Σ for 0 ≤ x ≤ 1 and Σ ∈ ℝ
(1 – |x|)(ij) + x(i + j) ≤ Σ for –1 ≤ x ≤ 1
14
20 40 60 80 100 20 40 60 80 100
Pauli Miettinen 8 June 2017
All the models are the same
parameters for one of the three models above, there exists valid pairs of parameters for the other two models that will model exactly the same graph
forward (re-parametrization)
15
Pauli Miettinen 8 June 2017
Hyperbolic vs. cliques and core/tail
community size
special case of the model
be checked)
16
Pauli Miettinen 8 June 2017
Hyperbolic vs. power-law
(1 – x)(i∙j) + x(i + j) ≤ Σ
echnically HyCom is 1/2(i∙j) + 1/2(i + j) ≤ Σ
17 20 40 60 80 100 10 20 30 40 50 60 70 80 90 100
20 40 60 80 100 20 40 60 80 100
Pauli Miettinen 8 June 2017
Some example communities
18
Pauli Miettinen 8 June 2017
dense
sparse
|E|log(d) + |∁E|log(1–d) +|O|log(s) + |∁O|log(s)
edges in comm., d = density of comm., |O| = edges outside of comm., |∁O| = non-edges outside
comm.
19
50 100 150 200 50 100 150 200
Dense Spars
Pauli Miettinen 8 June 2017
More than one community
straight forward
sparse
20
Pauli Miettinen 8 June 2017
Overlapping communities!
21
This is a good community This is also a good community But now this is a bad community! And now it’s just a big mess!
Pauli Miettinen 8 June 2017
22
No overlap Node overlap Edge overlap Just find the communities and model them The overlapping edges must be ignored The overlapping edges must be handled Assign every edge to at most 1 community
Pauli Miettinen 8 June 2017
Are these good models?
23
LL ratio block model HyCom Amazon 26450.6 30997.1 DBLP (100) 3148.5
DBLP
17958.1 Friendster 200627.6 17811.7 LiveJournal 154982.4 22705.8 Orkut 11945.3 1598.5 YouTube 75689.6 12660.0
LL ratio spectral clustering BMF HyCom Email 10895.8 3552.0 250.1 Erd˝
1797.0 949.0 256.3 Jazz 3003.8 4435.0 3718.5 PolBooks 648.0 303.3 228.2
Likelihood ratio test the larger the ratio, the more likely our model is better Ground-truth communities Calculated communities
Pauli Miettinen 8 June 2017
Distributions of parameters
24 Amazon DBLP(100) DBLP Friendster LiveJournal Orkut YouTube 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
. relative to community size
How clique-like the communities are?
10 20 30 5 10 15 20 25 30 35 100 200 300 50 100 150 200 250 300Pauli Miettinen 8 June 2017
Distribution of parameters
25
Amazon DBLP(100) DBLP Friendster LiveJournal Orkut YouTube 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1H relative to community size
Amazon DBLP(100) DBLP Friendster LiveJournal Orkut YouTubex
How thick the tail is? How similar to the HyCom the models are?
Pauli Miettinen 8 June 2017
How to find the communities?
algorithm, and re-model
very good
cores
26
Pauli Miettinen 8 June 2017
matrix as a union of nested submatrices
potentially easier
suitable matrix factorization
27
Pauli Miettinen 8 June 2017
Nested matrices as rank-1 structures
as A = thr(xyT)
28
Neumann et al. ICDM 2016
Pauli Miettinen 8 June 2017
29
1 1 1 1 1 1
@
4 2 1
1 A
1 1/2 1/4
@
4 2 1 2 1 1/2 1 1/2 1/4
1 A
thr1/2+
4 2 1 2 1 1/2 1 1/2 1/4
=
1 1 1 1 1 1
Pauli Miettinen 8 June 2017
Combining nested matrices: problems
x + y ≥ θ
disjoint communities
30
Pauli Miettinen 8 June 2017
Tropical algebra to the rescue
nonnegative reals has
thr(x ⊞ y) = thr(x) ⊞ thr(y)
31
Pauli Miettinen 8 June 2017
nonnegative matrix B with k columns minimizing ||A – thr(B⊠BT)||
algebra
permutations) matrices N1, N2, …, Nk that minimize ||A – ∪i Ni||
32
Karaev et al. submitted
Pauli Miettinen 8 June 2017
Resulting communities
33
Hyperbolic Almost hyperbolic Not at all hyperbolic
Pauli Miettinen 8 June 2017
with more data
communities
34