Structural sparsity in the real world Erik Demaine , Felix Reidl , - - PowerPoint PPT Presentation

structural sparsity in the real world
SMART_READER_LITE
LIVE PREVIEW

Structural sparsity in the real world Erik Demaine , Felix Reidl , - - PowerPoint PPT Presentation

Structural sparsity in the real world Erik Demaine , Felix Reidl , Peter Rossmanith, Fernando Snchez Villaamil, Blair D. Sullivan and Somnath Sikdar Theoretical Computer Science MIT NCSU @Bergen 2015 Contents The Program


slide-1
SLIDE 1

Structural sparsity in the real world

Erik Demaine∗, Felix Reidl, Peter Rossmanith, Fernando

Sánchez Villaamil, Blair D. Sullivan† and Somnath Sikdar

Theoretical Computer Science

∗MIT †NCSU

@Bergen 2015

slide-2
SLIDE 2

Contents

The Program Structural Sparseness Models Algorithms Empirical Sparseness

slide-3
SLIDE 3

The Program

slide-4
SLIDE 4

Complex networks Structural graph theory Ubiquitous in real world Well-researched Empirical structure Deep structural theorems

  • Small-world
  • WQO by minor relation
  • Heavy-tailed degree seq.
  • Decomposition theorems
  • Clustering
  • Grid-theorem

Algorithmic applications Great algorithmic properties

  • Disease spreading
  • (E)PTAS
  • Attack resilience
  • Subexponential algorithms
  • Fraud detection
  • Linear kernels
  • Drug discovery
  • Model-checking

Can we bring these two fields together?

slide-5
SLIDE 5

The idea

1 Bridge the gap by identifying a notion of sparseness that

applies to complex networks.

2 Develop algorithmic tools for network related problems. 3 Show experimentally that the above is useful in practice.

slide-6
SLIDE 6

The idea

1 Bridge the gap by identifying a notion of sparseness that

applies to complex networks.

  • Need general and stable notion of sparseness.
  • How to prove that it holds for complex networks?

2 Develop algorithmic tools for network related problems.

  • Unclear what problems are interesting.

3 Show experimentally that the above is useful in practice.

  • Show that structural sparseness appears in the real world.
  • Show that algorithms can compete with known approaches.
slide-7
SLIDE 7

Structural Sparseness

slide-8
SLIDE 8

Star forests Bounded treedepth Bounded treewidth Excluding a minor Excluding a topological minor Bounded expansion Outerplanar Planar Bounded genus Linear forests Bounded degree Locally bounded treewidth Locally excluding a minor Forests

r r

Locally bounded expansion Nowhere dense

∇ ∇

r

ω

slide-9
SLIDE 9

Bounded expansion

A graph class has bounded expansion if the density of its minors only depends on their depth.

The following operations on a class of bounded expansion result again in a class of bounded expansion:

  • Taking shallow minors/immersions (in particular subgraphs)
  • Adding a universal vertex
  • Replacing each vertex by a small clique (lexicographic product)
slide-10
SLIDE 10

Models

slide-11
SLIDE 11

Chung-Lu

4 1 3 E[d]

Perturbed bounded degree

1 /6 1 /3 1 /5

Stochastic Block Conguration Kleinberg Barabasi-Albert

∏(k)∝k

Heavy-tailed degree distribution

slide-12
SLIDE 12

The positive side

Name Definition f(d) Parameters Power law d−γ γ > 2 Power law w/ cutoff d−γe−λd γ > 2, λ > 0 Exponential e−λd λ > 0 Stretched exponential dβ−1e−λdβ λ, β > 0 Gaussian exp(− (d−µ)2

2σ2

) µ, σ Log-normal d−1 exp(− (log d−µ)2

2σ2

) µ, σ

Theorem

Let D be an asymptotic degree distribution with finite mean. Then random graphs generated by the Configuration Model or the Chung-Lu model with parameter D have bounded expansion with high probability.

slide-13
SLIDE 13

The positive side

Theorem

The perturbed bounded degree model has bounded expansion with high probability. Perturbing forests of S√n results in a somewhere dense class.

slide-14
SLIDE 14

The negative side

Theorem

The Kleinberg Model is somewhere dense with high probability.

Theorem

The Barabási-Albert Model is somewhere dense with non-vanishing probability.

slide-15
SLIDE 15

Chung-Lu

4 1 3 E[d]

Perturbed bounded degree

1 /6 1 /3 1 /5

Stochastic Block Conguration Kleinberg Barabasi-Albert

∏(k)∝k

Heavy-tailed degree distribution Bounded expansion Somewhere dense

slide-16
SLIDE 16

Algorithms

slide-17
SLIDE 17

Neighbourhood sizes

Measure Definition Localized Closeness (

  • u∈V (G)

d(v, u))−1 (

  • u∈N r(v)

d(v, u))−1 Harmonic

  • u∈V (G)

d(v, u)−1

  • u∈N r(v)

d(v, u)−1 Lin’s index |{v | d(v, v) < ∞}|2

  • u∈V (G):d(v,u)<∞ d(v, u)

|N r[v]|2

  • u∈N r[v] d(v, u)

Theorem

Let G be a graph class of bounded expansion. There is an algorithm that for every r ∈ N and G ∈ G computes the size of the i-th neighbourhood of every vertex of G, for all i ≤ r, in linear time.

slide-18
SLIDE 18

Closeness centrality

PetterKristiansen

JanArneT elle

SergeGaspers

PetrA.Golovach

JeanR.S.Blair RodicaMihai MartinVatshelle

YngveVillanger

JesperNederlof MikeFellows BartM.P .Jansen FedericoMancini IsoldeAdler FredericDorn

FedorV.Fomin

ArchontiaC.Giannopoulou

SaketSaurabh

Binh-MinhBui-Xuan DanielMeister

PinarHeggernes

RezaSaei

DanielLokshtanov

DieterKratsch RémyBelmonte FredrikManne DimitriosM.Thilikos IoanT
  • dinca

Pimvan'tHof

CharisPapadopoulos AssefawHadishGebremedhin JiríFiala M.S.Ramanujan MarcinPilipczuk MichalPilipczuk AndrzejProskurowski ErikJanvanLeeuwen QinXin SigveHortemoSæther ManuBasavaraju PålGrønåsDrange ArashRafiey YuriRabinovich ChristianSloper MagnúsM.Halldórsson AlexeyA.Stepanov FahadPanolan MarkusSortlandDregi FranRosamond SadiaSharmin BengtAspvall LeneM.Favrholdt MortenMjelde JohannesLangguth PetterKristiansen

JanArneT elle

SergeGaspers

PetrA.Golovach

JeanR.S.Blair RodicaMihai MartinVatshelle

YngveVillanger

JesperNederlof MikeFellows BartM.P .Jansen FedericoMancini IsoldeAdler FredericDorn

FedorV.Fomin

ArchontiaC.Giannopoulou

SaketSaurabh

Binh-MinhBui-Xuan DanielMeister

PinarHeggernes

RezaSaei

DanielLokshtanov

DieterKratsch RémyBelmonte FredrikManne DimitriosM.Thilikos IoanT
  • dinca

Pimvan'tHof

CharisPapadopoulos AssefawHadishGebremedhin JiríFiala M.S.Ramanujan MarcinPilipczuk MichalPilipczuk AndrzejProskurowski ErikJanvanLeeuwen QinXin SigveHortemoSæther ManuBasavaraju PålGrønåsDrange ArashRafiey YuriRabinovich ChristianSloper MagnúsM.Halldórsson AlexeyA.Stepanov FahadPanolan MarkusSortlandDregi FranRosamond SadiaSharmin BengtAspvall LeneM.Favrholdt MortenMjelde JohannesLangguth

(

  • u∈N1(v)

d(v, u))−1

Network provided by Pål

slide-19
SLIDE 19

Closeness centrality

PetterKristiansen

JanArneT elle

SergeGaspers

PetrA.Golovach

JeanR.S.Blair

RodicaMihai MartinVatshelle

YngveVillanger

JesperNederlof

MikeFellows

BartM.P .Jansen

FedericoMancini IsoldeAdler

FredericDorn

FedorV.Fomin

ArchontiaC.Giannopoulou

SaketSaurabh

Binh-MinhBui-Xuan

DanielMeister

PinarHeggernes

RezaSaei

DanielLokshtanov

DieterKratsch

RémyBelmonte

FredrikManne

DimitriosM.Thilikos IoanT
  • dinca

Pimvan'tHof

CharisPapadopoulos AssefawHadishGebremedhin JiríFiala M.S.Ramanujan MarcinPilipczuk

MichalPilipczuk

AndrzejProskurowski

ErikJanvanLeeuwen

QinXin SigveHortemoSæther ManuBasavaraju PålGrønåsDrange ArashRafiey YuriRabinovich ChristianSloper MagnúsM.Halldórsson AlexeyA.Stepanov FahadPanolan MarkusSortlandDregi

FranRosamond

SadiaSharmin BengtAspvall LeneM.Favrholdt MortenMjelde JohannesLangguth PetterKristiansen

JanArneT elle

SergeGaspers

PetrA.Golovach

JeanR.S.Blair

RodicaMihai MartinVatshelle

YngveVillanger

JesperNederlof

MikeFellows

BartM.P .Jansen

FedericoMancini IsoldeAdler

FredericDorn

FedorV.Fomin

ArchontiaC.Giannopoulou

SaketSaurabh

Binh-MinhBui-Xuan

DanielMeister

PinarHeggernes

RezaSaei

DanielLokshtanov

DieterKratsch

RémyBelmonte

FredrikManne

DimitriosM.Thilikos IoanT
  • dinca

Pimvan'tHof

CharisPapadopoulos AssefawHadishGebremedhin JiríFiala M.S.Ramanujan MarcinPilipczuk

MichalPilipczuk

AndrzejProskurowski

ErikJanvanLeeuwen

QinXin SigveHortemoSæther ManuBasavaraju PålGrønåsDrange ArashRafiey YuriRabinovich ChristianSloper MagnúsM.Halldórsson AlexeyA.Stepanov FahadPanolan MarkusSortlandDregi

FranRosamond

SadiaSharmin BengtAspvall LeneM.Favrholdt MortenMjelde JohannesLangguth

(

  • u∈N2(v)

d(v, u))−1

Network provided by Pål

slide-20
SLIDE 20

Closeness centrality

PetterKristiansen

JanArneT elle

SergeGaspers

PetrA.Golovach

JeanR.S.Blair

RodicaMihai MartinVatshelle

YngveVillanger

JesperNederlof

MikeFellows

BartM.P .Jansen

FedericoMancini

IsoldeAdler FredericDorn

FedorV.Fomin

ArchontiaC.Giannopoulou

SaketSaurabh

Binh-MinhBui-Xuan

DanielMeister

PinarHeggernes

RezaSaei

DanielLokshtanov

DieterKratsch

RémyBelmonte

FredrikManne

DimitriosM.Thilikos IoanT
  • dinca

Pimvan'tHof

CharisPapadopoulos

AssefawHadishGebremedhin

JiríFiala

M.S.Ramanujan MarcinPilipczuk

MichalPilipczuk

AndrzejProskurowski

ErikJanvanLeeuwen

QinXin SigveHortemoSæther ManuBasavaraju PålGrønåsDrange ArashRafiey YuriRabinovich ChristianSloper MagnúsM.Halldórsson AlexeyA.Stepanov FahadPanolan MarkusSortlandDregi

FranRosamond

SadiaSharmin BengtAspvall LeneM.Favrholdt MortenMjelde JohannesLangguth PetterKristiansen

JanArneT elle

SergeGaspers

PetrA.Golovach

JeanR.S.Blair

RodicaMihai MartinVatshelle

YngveVillanger

JesperNederlof

MikeFellows

BartM.P .Jansen

FedericoMancini

IsoldeAdler FredericDorn

FedorV.Fomin

ArchontiaC.Giannopoulou

SaketSaurabh

Binh-MinhBui-Xuan

DanielMeister

PinarHeggernes

RezaSaei

DanielLokshtanov

DieterKratsch

RémyBelmonte

FredrikManne

DimitriosM.Thilikos IoanT
  • dinca

Pimvan'tHof

CharisPapadopoulos

AssefawHadishGebremedhin

JiríFiala

M.S.Ramanujan MarcinPilipczuk

MichalPilipczuk

AndrzejProskurowski

ErikJanvanLeeuwen

QinXin SigveHortemoSæther ManuBasavaraju PålGrønåsDrange ArashRafiey YuriRabinovich ChristianSloper MagnúsM.Halldórsson AlexeyA.Stepanov FahadPanolan MarkusSortlandDregi

FranRosamond

SadiaSharmin BengtAspvall LeneM.Favrholdt MortenMjelde JohannesLangguth

(

  • u∈N3(v)

d(v, u))−1

Network provided by Pål

slide-21
SLIDE 21

Closeness centrality

PetterKristiansen

JanArneT elle

SergeGaspers

PetrA.Golovach

JeanR.S.Blair

RodicaMihai MartinVatshelle

YngveVillanger

JesperNederlof

MikeFellows

BartM.P .Jansen

FedericoMancini IsoldeAdler

FredericDorn

FedorV.Fomin

ArchontiaC.Giannopoulou

SaketSaurabh

Binh-MinhBui-Xuan

DanielMeister

PinarHeggernes

RezaSaei

DanielLokshtanov

DieterKratsch

RémyBelmonte

FredrikManne

DimitriosM.Thilikos IoanT
  • dinca

Pimvan'tHof

CharisPapadopoulos

AssefawHadishGebremedhin

JiríFiala

M.S.Ramanujan MarcinPilipczuk

MichalPilipczuk

AndrzejProskurowski

ErikJanvanLeeuwen

QinXin SigveHortemoSæther ManuBasavaraju PålGrønåsDrange ArashRafiey YuriRabinovich ChristianSloper MagnúsM.Halldórsson AlexeyA.Stepanov FahadPanolan MarkusSortlandDregi

FranRosamond

SadiaSharmin BengtAspvall LeneM.Favrholdt MortenMjelde JohannesLangguth PetterKristiansen

JanArneT elle

SergeGaspers

PetrA.Golovach

JeanR.S.Blair

RodicaMihai MartinVatshelle

YngveVillanger

JesperNederlof

MikeFellows

BartM.P .Jansen

FedericoMancini IsoldeAdler

FredericDorn

FedorV.Fomin

ArchontiaC.Giannopoulou

SaketSaurabh

Binh-MinhBui-Xuan

DanielMeister

PinarHeggernes

RezaSaei

DanielLokshtanov

DieterKratsch

RémyBelmonte

FredrikManne

DimitriosM.Thilikos IoanT
  • dinca

Pimvan'tHof

CharisPapadopoulos

AssefawHadishGebremedhin

JiríFiala

M.S.Ramanujan MarcinPilipczuk

MichalPilipczuk

AndrzejProskurowski

ErikJanvanLeeuwen

QinXin SigveHortemoSæther ManuBasavaraju PålGrønåsDrange ArashRafiey YuriRabinovich ChristianSloper MagnúsM.Halldórsson AlexeyA.Stepanov FahadPanolan MarkusSortlandDregi

FranRosamond

SadiaSharmin BengtAspvall LeneM.Favrholdt MortenMjelde JohannesLangguth

(

  • u∈N4(v)

d(v, u))−1

Network provided by Pål

slide-22
SLIDE 22

Top-10% recovery

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Jaccard similarity of top 10% Percentage of diameter

Netscience Codeminer Diseasome Cpan-distr. HepTh CondMat

slide-23
SLIDE 23

Counting substructures

Theorem

Given a graph H on h vertices, a graph G on n vertices and a treedepth decomposition of G of height t, one can compute the

  • number of isomorphisms from H to subgraphs of G,
  • homomorphisms from H to subgraphs of G, or
  • (induced) subgraphs of G isomorphic to H

in time O(8h · th · h2 · n) and space O(4h · th · ht · log n).

slide-24
SLIDE 24

Counting substructures

Theorem (Nešetˇ ril & Ossona de Mendez)

Let G be class of bounded expansion. There exists a function f such thaht for every p, every member of G has a p-centered coloring with at most f(p) colors. Moreover, such a coloring can be computed in linear time.

slide-25
SLIDE 25

Counting substructures

Theorem (Nešetˇ ril & Ossona de Mendez)

Let G be class of bounded expansion. There exists a function f such thaht for every p, every member of G has a p-centered coloring with at most f(p) colors. Moreover, such a coloring can be computed in linear time.

slide-26
SLIDE 26

5-centered coloring of gcc

  • f netscience graph.
slide-27
SLIDE 27

5-centered coloring of gcc

  • f netscience graph.
slide-28
SLIDE 28 12 6 16 6 3 ... 7 3 7 2 2

5-centered coloring of gcc

  • f netscience graph.
slide-29
SLIDE 29

i n a i n a H

  • w

m a n y H

  • w

m a n y

?

slide-30
SLIDE 30

Example: Counting P4s

Preprocessing: create k-Patterns (here: k = 2)

  • Take pattern graph P4
  • Choose separator
  • Choose component
  • 2

1 Label separator

slide-31
SLIDE 31

Example: Counting P4s

1

slide-32
SLIDE 32

Example: Counting P4s

1 1 2 1 2 2 1 1 2

slide-33
SLIDE 33

Example: Counting P4s

1 1 1 2 2 1 1 2 1 1 1

⊕ =

2 1

⊕ ⊕ = ⊕ = ⊕ =

1 2 1 2 1 2 1

slide-34
SLIDE 34

Example: Counting P4s

1 1 2 2 1 1 2 2 2 1 2 1

2
slide-35
SLIDE 35

Example: Counting P4s

1 1 2 2 1 1 2 2 2 1 2 1 1 2 1 2 2 1 1 2

2
slide-36
SLIDE 36

Example: Counting P4s

1 2 1 1 2 1 2 2

3 2 2 2 2

2 1 2 1

6 4

2 1

2
slide-37
SLIDE 37

Example: Counting P4s

1 2 1 1 2 1 2 2

3 2 2 2 2

2 1 2 1

6 4

2 1

2

2

slide-38
SLIDE 38

Example: Counting P4s

1 2 1 1 2 1 2 2

3 3 2 2 2

2 1 2 1

7 4

2 1

3
slide-39
SLIDE 39

Example: Counting P4s

1 1 1 1

3 3 2 2 2

1 1

7 4

1

3

1

slide-40
SLIDE 40

Example: Counting P4s

1 1 1

4 3 2 4

1

11

1

3
slide-41
SLIDE 41

Example: Counting P4s

4 3 2 4 11 3
slide-42
SLIDE 42

Example: Counting P4s

7 6 14
slide-43
SLIDE 43

Example: Counting P4s

7 6 14

There are seven P4s in the target graph.

slide-44
SLIDE 44 Beak Beescratch Bumper CCL Cross DN16 DN21 DN63 Double Feather Fish Five Fork Gallatin Grin Haecksel Hook Jet Jonah Kringel MN23 MN60 MN83 Mus Notch Number1 Oscar Patchback PL Quasi fl Scabs Shmuddel SMN5 SN100 SN4 SN63 SN89 SN9 SN90 SN96 Stripes Thumper
  • pless
TR120 TR77 TR82 TR88 TR99 Trigger TSN103 TSN83 Vau Wave eb Whitetip Zap Zig Zipfel

Empirical Sparseness

slide-45
SLIDE 45

Closing the gap

In order to claim that our approach is useful in practice we cannot just rely on theory.

  • Graph classes vs. concrete instances
  • The bounds given by our proofs are enormous.
  • Random graph models capture only some aspectes of

complex networks.

  • We prove asymptotic bounds.

(although we show fast convergence)

slide-46
SLIDE 46

p Network Vertices Edges 2 3 4 5 6 ∞ Airlines 235 1297 11 28 39 47 55 64 C.Elegans 306 2148 8 36 74 83 118 153 Codeminer 724 1017 5 10 15 17 23 51 Cpan-authors 839 2212 9 24 34 43 47 224 Diseasome 1419 2738 12 17 22 25 30 30 Polblogs 1491 16715 30 118 286 354 392 603 Netscience 1589 2742 20 20 28 28 28 20 Drosophila 1781 8911 12 65 137 188 263 395 Yeast 2284 6646 12 38 178 254 431 408 Cpan-distr. 2719 5016 5 14 32 42 56 224 Twittercrawl 3656 154824 89 561 1206 1285 1341 – Power 4941 6594 6 12 20 21 34 95 AS Jan 2000 6474 13895 12 29 70 102 151 357 Hep-th 7610 15751 24 25 104 328 360 558 Gnutella04 10876 39994 8 43 626 – – – ca-HepPh 12008 118489 239 296 1002 – – – CondMat 16264 47594 18 47 255 1839 – 1310 ca-CondMat 23133 93497 26 89 665 – – – Enron 36692 183831 27 214 1428 – – – Brightkite 58228 214078 39 193 1421 – – –

slide-47
SLIDE 47

p Network Vertices Edges 2 3 4 5 6 ∞ Airlines 235 1297 11 28 39 47 55 64 Power 4941 6594 6 12 20 21 34 95 AS Jan 2000 6474 13895 12 29 70 102 151 357 C.Elegans 306 2148 8 36 74 83 118 153 Diseasome 1419 2738 12 17 22 25 30 30 Drosophila 1781 8911 12 65 137 188 263 395 Yeast 2284 6646 12 38 178 254 431 408 Codeminer 724 1017 5 10 15 17 23 51 Gnutella04 10876 39994 8 43 626 – – – Enron 36692 183831 27 214 1428 – – – Brightkite 58228 214078 39 193 1421 – – – Cpan-authors 839 2212 9 24 34 43 47 224 Polblogs 1491 16715 30 118 286 354 392 603 Netscience 1589 2742 20 20 28 28 28 20 Cpan-distr. 2719 5016 5 14 32 42 56 224 Twittercrawl 3656 154824 89 561 1206 1285 1341 – Hep-th 7610 15751 24 25 104 328 360 558 ca-HepPh 12008 118489 239 296 1002 – – – CondMat 16264 47594 18 47 255 1839 – 1310 ca-CondMat 23133 93497 26 89 665 – – –

slide-48
SLIDE 48

p Network Vertices Edges 2 3 4 5 6 ∞ Airlines 235 1297 1.00 2.55 3.55 4.27 5.00 5.82 Power 4941 6594 1.00 2.00 3.33 3.50 5.67 15.83 AS Jan 2000 6474 13895 1.00 2.42 5.83 8.50 12.58 29.75 C.Elegans 306 2148 1.00 4.50 9.25 10.38 14.75 19.12 Diseasome 1419 2738 1.00 1.42 1.83 2.08 2.50 2.50 Drosophila 1781 8911 1.00 5.42 11.42 15.67 21.92 32.92 Yeast 2284 6646 1.00 3.17 14.83 21.17 35.92 34.00 Codeminer 724 1017 1.00 2.00 3.00 3.40 4.60 10.20 Gnutella04 10876 39994 1.00 5.38 78.25 – – – Enron 36692 183831 1.00 7.93 52.89 – – – Brightkite 58228 214078 1.00 4.95 36.44 – – – Cpan-authors 839 2212 1.00 2.67 3.78 4.78 5.22 24.89 Polblogs 1491 16715 1.00 3.93 9.53 11.80 13.07 20.10 Netscience 1589 2742 1.00 1.00 1.40 1.40 1.40 1.00 Cpan-distr. 2719 5016 1.00 2.80 6.40 8.40 11.20 44.80 Twittercrawl 3656 154824 1.00 6.30 13.55 14.44 15.07 – Hep-th 7610 15751 1.00 1.04 4.33 13.67 15.00 23.25 ca-HepPh 12008 118489 1.00 1.24 4.19 – – – CondMat 16264 47594 1.00 2.61 14.17 102.17 – 72.78 ca-CondMat 23133 93497 1.00 3.42 25.58 – – –

slide-49
SLIDE 49

Network structure

slide-50
SLIDE 50

Conclusion

  • We show that several important models of complex

networks have bounded expansion.

  • Besides the known algorithms (first-order model checking!)

we show that relevant problems can be solved faster by using this fact.

  • Our experiments demonstrate that many networks are

structurally sparse.

slide-51
SLIDE 51

Conclusion

  • We show that several important models of complex

networks have bounded expansion.

  • Besides the known algorithms (first-order model checking!)

we show that relevant problems can be solved faster by using this fact.

  • Our experiments demonstrate that many networks are

structurally sparse.

THANKS!

Questions?

THANKS!

Questions?