Algorithms for Big Data (VII) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

algorithms for big data vii
SMART_READER_LITE
LIVE PREVIEW

Algorithms for Big Data (VII) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

Algorithms for Big Data (VII) Chihao Zhang Shanghai Jiao Tong University Nov. 1, 2019 Algorithms for Big Data (VII) 1/17 Review We introduced the graph stream last week. The graph has vertices, but the edges are given in a streaming


slide-1
SLIDE 1

Algorithms for Big Data (VII)

Chihao Zhang

Shanghai Jiao Tong University

  • Nov. 1, 2019

Algorithms for Big Data (VII) 1/17

slide-2
SLIDE 2

Review

We introduced the graph stream last week. The graph has vertices, but the edges are given in a streaming fashion. Compute graph properties in time. This can be done for connectivity and bipartiteness.

Algorithms for Big Data (VII) 2/17

slide-3
SLIDE 3

Review

We introduced the graph stream last week. The graph has vertices, but the edges are given in a streaming fashion. Compute graph properties in time. This can be done for connectivity and bipartiteness.

Algorithms for Big Data (VII) 2/17

slide-4
SLIDE 4

Review

We introduced the graph stream last week. The graph has n vertices, but the edges are given in a streaming fashion. Compute graph properties in time. This can be done for connectivity and bipartiteness.

Algorithms for Big Data (VII) 2/17

slide-5
SLIDE 5

Review

We introduced the graph stream last week. The graph has n vertices, but the edges are given in a streaming fashion. Compute graph properties in o(n2) time. This can be done for connectivity and bipartiteness.

Algorithms for Big Data (VII) 2/17

slide-6
SLIDE 6

Review

We introduced the graph stream last week. The graph has n vertices, but the edges are given in a streaming fashion. Compute graph properties in o(n2) time. This can be done for connectivity and bipartiteness.

Algorithms for Big Data (VII) 2/17

slide-7
SLIDE 7

Shortest Path

Given an undirected simple graph . We want to answer the query “what is the minimum distance between and for ”. Our algorithm computes a subgraph

  • f

such that for some constant .

Algorithms for Big Data (VII) 3/17

slide-8
SLIDE 8

Shortest Path

Given an undirected simple graph G = (V, E). We want to answer the query “what is the minimum distance between u and v for u, v ∈ V”. Our algorithm computes a subgraph

  • f

such that for some constant .

Algorithms for Big Data (VII) 3/17

slide-9
SLIDE 9

Shortest Path

Given an undirected simple graph G = (V, E). We want to answer the query “what is the minimum distance between u and v for u, v ∈ V”. Our algorithm computes a subgraph H = (V, EH) of G such that ∀u, v ∈ V, dG(u, v) ≤ dH(u, v) ≤ α · dG(u, v) for some constant α ≥ 1.

Algorithms for Big Data (VII) 3/17

slide-10
SLIDE 10

Algorithm Shortest Path Init: EH ← ∅; On Input (u, v): if dH(u, v) ≥ α + 1 then H ← H ∪ {(u, v)} end if Output: On query (u, v) Output dH(u, v).

Algorithms for Big Data (VII) 4/17

slide-11
SLIDE 11

Clearly, dH(u, v) ≥ dH(u, v) as H contains less edges. Consider the shortest path from to in : Then . If , then . If , then when we are trying to insert into , it must hold that In all, we have

Algorithms for Big Data (VII) 5/17

slide-12
SLIDE 12

Clearly, dH(u, v) ≥ dH(u, v) as H contains less edges. Consider the shortest path from u to v in G: u = x1, x2, . . . , xk = v. Then . If , then . If , then when we are trying to insert into , it must hold that In all, we have

Algorithms for Big Data (VII) 5/17

slide-13
SLIDE 13

Clearly, dH(u, v) ≥ dH(u, v) as H contains less edges. Consider the shortest path from u to v in G: u = x1, x2, . . . , xk = v. Then dG(u, v) = ∑k−1

i=1 d(xi, xi+1).

If , then . If , then when we are trying to insert into , it must hold that In all, we have

Algorithms for Big Data (VII) 5/17

slide-14
SLIDE 14

Clearly, dH(u, v) ≥ dH(u, v) as H contains less edges. Consider the shortest path from u to v in G: u = x1, x2, . . . , xk = v. Then dG(u, v) = ∑k−1

i=1 d(xi, xi+1).

If (xi, xi+1) ∈ EH, then dH(xi, xi+1) = dG(xi, xi+1). If , then when we are trying to insert into , it must hold that In all, we have

Algorithms for Big Data (VII) 5/17

slide-15
SLIDE 15

Clearly, dH(u, v) ≥ dH(u, v) as H contains less edges. Consider the shortest path from u to v in G: u = x1, x2, . . . , xk = v. Then dG(u, v) = ∑k−1

i=1 d(xi, xi+1).

If (xi, xi+1) ∈ EH, then dH(xi, xi+1) = dG(xi, xi+1). If (xi, xi+1) ̸∈ EH, then when we are trying to insert (xi, xi+1) into EH, it must hold that dH(xi, xi+1) ≤ α. In all, we have

Algorithms for Big Data (VII) 5/17

slide-16
SLIDE 16

Clearly, dH(u, v) ≥ dH(u, v) as H contains less edges. Consider the shortest path from u to v in G: u = x1, x2, . . . , xk = v. Then dG(u, v) = ∑k−1

i=1 d(xi, xi+1).

If (xi, xi+1) ∈ EH, then dH(xi, xi+1) = dG(xi, xi+1). If (xi, xi+1) ̸∈ EH, then when we are trying to insert (xi, xi+1) into EH, it must hold that dH(xi, xi+1) ≤ α. In all, we have dH(u, v) ≤ α · dG(u, v).

Algorithms for Big Data (VII) 5/17

slide-17
SLIDE 17

Space Consumption

We need a bit of graph theory to analyze the space consumption. The girth

  • f a graph

is the length of its shortest cycle. It is clear that .

Theorem

Let be a sufgiciently large graph with . Let and . Then

Algorithms for Big Data (VII) 6/17

slide-18
SLIDE 18

Space Consumption

We need a bit of graph theory to analyze the space consumption. The girth

  • f a graph

is the length of its shortest cycle. It is clear that .

Theorem

Let be a sufgiciently large graph with . Let and . Then

Algorithms for Big Data (VII) 6/17

slide-19
SLIDE 19

Space Consumption

We need a bit of graph theory to analyze the space consumption. The girth g(G) of a graph G is the length of its shortest cycle. It is clear that .

Theorem

Let be a sufgiciently large graph with . Let and . Then

Algorithms for Big Data (VII) 6/17

slide-20
SLIDE 20

Space Consumption

We need a bit of graph theory to analyze the space consumption. The girth g(G) of a graph G is the length of its shortest cycle. It is clear that g(H) ≥ α + 2.

Theorem

Let be a sufgiciently large graph with . Let and . Then

Algorithms for Big Data (VII) 6/17

slide-21
SLIDE 21

Space Consumption

We need a bit of graph theory to analyze the space consumption. The girth g(G) of a graph G is the length of its shortest cycle. It is clear that g(H) ≥ α + 2.

Theorem

Let G = (V, E) be a sufgiciently large graph with g(G) ≥ k. Let n = |V| and m = |E|. Then m ≤ n + n

1+

1 ⌊ k−1 2 ⌋ .

Algorithms for Big Data (VII) 6/17

slide-22
SLIDE 22

The k-core of a graph G is a subgraph whose degree is at least k. Let be the average degree of , then contains a

  • core. (Why?)

The

  • core has girth at least

, so we can find a BFS tree in it with depth and width . The number of the vertices satisfies This bound is in fact tight, can you prove it?

Algorithms for Big Data (VII) 7/17

slide-23
SLIDE 23

The k-core of a graph G is a subgraph whose degree is at least k. Let d = 2m/n be the average degree of G, then G contains a d/2-core. (Why?) The

  • core has girth at least

, so we can find a BFS tree in it with depth and width . The number of the vertices satisfies This bound is in fact tight, can you prove it?

Algorithms for Big Data (VII) 7/17

slide-24
SLIDE 24

The k-core of a graph G is a subgraph whose degree is at least k. Let d = 2m/n be the average degree of G, then G contains a d/2-core. (Why?) The d/2-core has girth at least k, so we can find a BFS tree in it with depth ⌊ k−1

2 ⌋ and

width d

2 − 1.

The number of the vertices satisfies This bound is in fact tight, can you prove it?

Algorithms for Big Data (VII) 7/17

slide-25
SLIDE 25

The k-core of a graph G is a subgraph whose degree is at least k. Let d = 2m/n be the average degree of G, then G contains a d/2-core. (Why?) The d/2-core has girth at least k, so we can find a BFS tree in it with depth ⌊ k−1

2 ⌋ and

width d

2 − 1.

The number of the vertices satisfies n ≥ (d 2 − 1 )⌊ k−1

2 ⌋

= (m n − 1 )⌊ k−1

2 ⌋

. This bound is in fact tight, can you prove it?

Algorithms for Big Data (VII) 7/17

slide-26
SLIDE 26

The k-core of a graph G is a subgraph whose degree is at least k. Let d = 2m/n be the average degree of G, then G contains a d/2-core. (Why?) The d/2-core has girth at least k, so we can find a BFS tree in it with depth ⌊ k−1

2 ⌋ and

width d

2 − 1.

The number of the vertices satisfies n ≥ (d 2 − 1 )⌊ k−1

2 ⌋

= (m n − 1 )⌊ k−1

2 ⌋

. This bound is in fact tight, can you prove it?

Algorithms for Big Data (VII) 7/17

slide-27
SLIDE 27

Matchings

Let be a graph, a matching consisting of edges sharing no vertex. The problem of finding maximum matching is a famous polynmial-time solvable problem. Now we try to approximate it in the streaming setuing.

Algorithms for Big Data (VII) 8/17

slide-28
SLIDE 28

Matchings

Let G = (V, E) be a graph, a matching M ⊆ E consisting of edges sharing no vertex. The problem of finding maximum matching is a famous polynmial-time solvable problem. Now we try to approximate it in the streaming setuing.

Algorithms for Big Data (VII) 8/17

slide-29
SLIDE 29

Matchings

Let G = (V, E) be a graph, a matching M ⊆ E consisting of edges sharing no vertex. The problem of finding maximum matching is a famous polynmial-time solvable problem. Now we try to approximate it in the streaming setuing.

Algorithms for Big Data (VII) 8/17

slide-30
SLIDE 30

Matchings

Let G = (V, E) be a graph, a matching M ⊆ E consisting of edges sharing no vertex. The problem of finding maximum matching is a famous polynmial-time solvable problem. Now we try to approximate it in the streaming setuing.

Algorithms for Big Data (VII) 8/17

slide-31
SLIDE 31

Algorithm Maximum Matching Init: M ← ∅; On Input (u, v): if M ∪ {(u, v)} is a matching then M ← M ∪ {(u, v)} end if Output: Output |M|.

Algorithms for Big Data (VII) 9/17

slide-32
SLIDE 32

Let M denote our estimate and M∗ denote the maximum matching.

Theorem

is a maximal matching. Each intersects at most two edges in .

Algorithms for Big Data (VII) 10/17

slide-33
SLIDE 33

Let M denote our estimate and M∗ denote the maximum matching.

Theorem

|M∗| 2 ≤

  • M
  • ≤ |M∗| .

is a maximal matching. Each intersects at most two edges in .

Algorithms for Big Data (VII) 10/17

slide-34
SLIDE 34

Let M denote our estimate and M∗ denote the maximum matching.

Theorem

|M∗| 2 ≤

  • M
  • ≤ |M∗| .

M∗ is a maximal matching. Each e ∈ M intersects at most two edges in M∗.

Algorithms for Big Data (VII) 10/17

slide-35
SLIDE 35

Maximum Weighted Matching

Each edge is associated with a non-negative weight . Compute a matching to maximize . Algorithm Maximum Weighted Matching Init: ; On Input : if is a matching then else ; if then ; end if end if Output: Output .

Algorithms for Big Data (VII) 11/17

slide-36
SLIDE 36

Maximum Weighted Matching

Each edge e ∈ E is associated with a non-negative weight w(e) ≥ 0. Compute a matching to maximize . Algorithm Maximum Weighted Matching Init: ; On Input : if is a matching then else ; if then ; end if end if Output: Output .

Algorithms for Big Data (VII) 11/17

slide-37
SLIDE 37

Maximum Weighted Matching

Each edge e ∈ E is associated with a non-negative weight w(e) ≥ 0. Compute a matching M to maximize ∑

e∈M w(e).

Algorithm Maximum Weighted Matching Init: M ← ∅; On Input (u, v): if M ∪ {(u, v)} is a matching then M ← M ∪ {(u, v)} else C ← {e ∈ M : u ∈ e ∨ v ∈ e}; if w(u, v) > 2w(C) then M ← (M \ C) ∪ {(u, v)}; end if end if Output: Output |M|.

Algorithms for Big Data (VII) 11/17

slide-38
SLIDE 38

Analysis

We use a charging argument to analyze the algorithm. We call an edge : born if we added it to ; die if it was removed from ; murdered by if it dies because we add . For every , we define the family of victims: edges murdered by edges murdered by

Algorithms for Big Data (VII) 12/17

slide-39
SLIDE 39

Analysis

We use a charging argument to analyze the algorithm. We call an edge : born if we added it to ; die if it was removed from ; murdered by if it dies because we add . For every , we define the family of victims: edges murdered by edges murdered by

Algorithms for Big Data (VII) 12/17

slide-40
SLIDE 40

Analysis

We use a charging argument to analyze the algorithm. We call an edge e:

▶ born if we added it to M; ▶ die if it was removed from M; ▶ murdered by e′ if it dies because we add e′.

For every , we define the family of victims: edges murdered by edges murdered by

Algorithms for Big Data (VII) 12/17

slide-41
SLIDE 41

Analysis

We use a charging argument to analyze the algorithm. We call an edge e:

▶ born if we added it to M; ▶ die if it was removed from M; ▶ murdered by e′ if it dies because we add e′.

For every e ∈ M, we define the family of victims: C0(e) = {e} , C1(e) = edges murdered by e, . . . , Ci(e) = ∪

f∈Ci−1(e)

edges murdered by f, . . .

Algorithms for Big Data (VII) 12/17

slide-42
SLIDE 42

Lemma

For every e, w  ∪

i≥1

Ci(e)   ≥ w(e).

Proof.

By the definition of murdering, . Therefore

Algorithms for Big Data (VII) 13/17

slide-43
SLIDE 43

Lemma

For every e, w  ∪

i≥1

Ci(e)   ≥ w(e).

Proof.

By the definition of murdering, w(Ci+1) ≤ w(Ci)/2. Therefore 2 ∑

i≥1

w (Ci(e)) ≤ ∑

i≥0

w(Ci) = w(e) + ∑

i≥1

w(Ci).

Algorithms for Big Data (VII) 13/17

slide-44
SLIDE 44

Lemma

w(M∗) ≤ ∑

e∈M

 4w(e) + 2w  ∪

i≥1

Ci(e)     . We consider

  • f

in the order of the stream. if is born, charge to ; if is not born, charge to its conflicting edges ( is divided proportional to the weight of the conflicting edges); if some murdered some and has been charged by some , then move the charge from to .

Algorithms for Big Data (VII) 14/17

slide-45
SLIDE 45

Lemma

w(M∗) ≤ ∑

e∈M

 4w(e) + 2w  ∪

i≥1

Ci(e)     . We consider e∗

1, e∗ 2, . . . of M∗ in the order of the stream. ▶ if e∗ i is born, charge w(e∗ i ) to e∗ i ; ▶ if e∗ i is not born, charge w(e∗ i ) to its conflicting edges (w∗(e) is divided

proportional to the weight of the conflicting edges);

▶ if some e′ = (u, v) murdered some e = (u′, v) and e has been charged by some

e∗ = (u′′, v), then move the charge from e to e′.

Algorithms for Big Data (VII) 14/17

slide-46
SLIDE 46

At last, we have

▶ for every e ∈ M, its charge is at most 4w(e); ▶ for every e ∈ ∪ i≥1 C(e′) for some e′, its charge is at most 2w(e).

Therefore, The analysis is not pushed to the limit yet, can you improve the approximation ratio ? (Exercise)

Algorithms for Big Data (VII) 15/17

slide-47
SLIDE 47

At last, we have

▶ for every e ∈ M, its charge is at most 4w(e); ▶ for every e ∈ ∪ i≥1 C(e′) for some e′, its charge is at most 2w(e).

Therefore, w(M∗) ≤ ∑

e∈M

 4w(e) + 2w  ∪

i≥1

Ci     ≤ 6w(M). The analysis is not pushed to the limit yet, can you improve the approximation ratio ? (Exercise)

Algorithms for Big Data (VII) 15/17

slide-48
SLIDE 48

At last, we have

▶ for every e ∈ M, its charge is at most 4w(e); ▶ for every e ∈ ∪ i≥1 C(e′) for some e′, its charge is at most 2w(e).

Therefore, w(M∗) ≤ ∑

e∈M

 4w(e) + 2w  ∪

i≥1

Ci     ≤ 6w(M). The analysis is not pushed to the limit yet, can you improve the approximation ratio 6? (Exercise)

Algorithms for Big Data (VII) 15/17

slide-49
SLIDE 49

Counting Triangles

An important topic is to compute the number of some fixed subgraph in a graph in the streaming setuing. We study a simple algorithm for counting triangles. Consider an vector f , where for every , . So if for some , , then is a triangle in . The algorithm simply outputs , where f .

Algorithms for Big Data (VII) 16/17

slide-50
SLIDE 50

Counting Triangles

An important topic is to compute the number of some fixed subgraph in a graph in the streaming setuing. We study a simple algorithm for counting triangles. Consider an vector f , where for every , . So if for some , , then is a triangle in . The algorithm simply outputs , where f .

Algorithms for Big Data (VII) 16/17

slide-51
SLIDE 51

Counting Triangles

An important topic is to compute the number of some fixed subgraph in a graph in the streaming setuing. We study a simple algorithm for counting triangles. Consider an vector f , where for every , . So if for some , , then is a triangle in . The algorithm simply outputs , where f .

Algorithms for Big Data (VII) 16/17

slide-52
SLIDE 52

Counting Triangles

An important topic is to compute the number of some fixed subgraph in a graph in the streaming setuing. We study a simple algorithm for counting triangles. Consider an vector f = (fT)T∈([n]

3 ), where for every T = x, y, z,

fT = |{{x, y} , {x, z} , {y, z}} ∩ E|. So if for some , , then is a triangle in . The algorithm simply outputs , where f .

Algorithms for Big Data (VII) 16/17

slide-53
SLIDE 53

Counting Triangles

An important topic is to compute the number of some fixed subgraph in a graph in the streaming setuing. We study a simple algorithm for counting triangles. Consider an vector f = (fT)T∈([n]

3 ), where for every T = x, y, z,

fT = |{{x, y} , {x, z} , {y, z}} ∩ E|. So if for some T = {x, y, z}, fT = 3, then x, y, z is a triangle in G. The algorithm simply outputs , where f .

Algorithms for Big Data (VII) 16/17

slide-54
SLIDE 54

Counting Triangles

An important topic is to compute the number of some fixed subgraph in a graph in the streaming setuing. We study a simple algorithm for counting triangles. Consider an vector f = (fT)T∈([n]

3 ), where for every T = x, y, z,

fT = |{{x, y} , {x, z} , {y, z}} ∩ E|. So if for some T = {x, y, z}, fT = 3, then x, y, z is a triangle in G. The algorithm simply outputs F0 − 1.5F1 + 0.5F2, where Fi = ∥f∥i

i.

Algorithms for Big Data (VII) 16/17

slide-55
SLIDE 55

We can expand F0 − 1.5F1 + 0.5F2 as ∑

T∈([n]

3 )

0.5f2

T − 1.5fT + 1[fT ̸= 0].

The “polynomial” 1 satisfies ; . The multiplicative error of the algorithm is unbounded!

Algorithms for Big Data (VII) 17/17

slide-56
SLIDE 56

We can expand F0 − 1.5F1 + 0.5F2 as ∑

T∈([n]

3 )

0.5f2

T − 1.5fT + 1[fT ̸= 0].

The “polynomial” f(x) = 0.5x2 − 1.5x + 1[x ̸= 0] satisfies

▶ f(0) = f(1) = f(2) = 0; ▶ f(3) = 1.

The multiplicative error of the algorithm is unbounded!

Algorithms for Big Data (VII) 17/17

slide-57
SLIDE 57

We can expand F0 − 1.5F1 + 0.5F2 as ∑

T∈([n]

3 )

0.5f2

T − 1.5fT + 1[fT ̸= 0].

The “polynomial” f(x) = 0.5x2 − 1.5x + 1[x ̸= 0] satisfies

▶ f(0) = f(1) = f(2) = 0; ▶ f(3) = 1.

The multiplicative error of the algorithm is unbounded!

Algorithms for Big Data (VII) 17/17