Topics in Algorithms and Data Science Random Graphs (2 nd part) Omid Etesami
Phase transitions for CNF-SAT
Phase transitions for other random structures • We already saw phase transitions for random graphs • Other random structures, like Boolean formula in conjunctive normal form (CNF), also have phase transitions
Random k- CNF formula • n variables • m clauses • k literals per clause ( k constant) • literal = variable or negation • each clause independently chosen from possible clauses. • Unsatisfiability is an increasing property, so it has phase transition.
Satisfiability conjecture • Conjecture. There is a constant r k such that m = r k n is a sharp threshold for satisfiability. The conjecture was recently proved for large k by Ding, Sly, Sun!
Upper bound on r k • Let m = cn. • Each truth assignment satisfies the CNF with probability (1 – 2 -k ) cn . • The probability that the CNF is satisfiable is at most 2 n (1 – 2 -k ) cn . • Thus r k ≤ 2 k ln 2. 3-SAT solution space (height represents # of unsatisfied constraints)!
Lower bound on r k • Lower bound more difficult. 2 nd moment method doesn’t work. • We focus on k = 3. • Smallest Clause (SC) heuristic finds a satisfying solution almost surely when m = cn and constant c < 2/3. Thus r 3 ≥ 2/3.
Smallest Clause (SC) heuristic While not all clauses satisfied assign true to a random literal in a random smallest-length clause delete satisfied clauses; delete unsatisfied literals. If a 0-length clause is ever found, we have failed.
Queue of 1-literal and 2-literal clauses • While queue is not empty, a member of the queue is satisfied. • Setting a literal to true, may add other clauses to the queue. • We will show that while the queue is non-empty, the arrival rate is less than the departure rate.
Principle of deferred decisions • We pretend that we do not know the literals appearing in each clause. • During the algorithm, we only know the size of each clause.
Queue arrival rate • When the t’ th literal is assigned value, each 3-literal clause is added to the queue with probability 3/(2(n-t+1)). • (With the same probability, the clause is satisfied.) • Therefore, the average # of clauses added to the queue at each step is at most 3(cn – t + 1)/(2(n-t+1)) = 1 – Ω (1).
The waiting time is O(lg n) Thm. The # steps any clause remains in the queue is Ω (lg n) with probability at most 1/n 3 . The probability that the queue is empty at step t and remains non-empty in steps t, t + 1, …, t + s - 1 is at most exp(- Ω (s)) by multiplicative Chernoff bound: the # arrivals should be at least s while mean # arrivals is s(1 – Ω (1)). (We upper-bound # arrivals with sum of independent Bernoullies.) There are only n choices for t . Therefore for suitable choice of s 0 = Ө (lg n), any non-empty episode is of length at most s 0 with probability 1 – 1/n 3 .
The probability that setting a literal in the i’ th clause makes the j ’th clause false is o(1/n 2 ) If this trouble happens, then • either of i ’th or j’ th clause is added to the queue at some step t, • j ’th clause consists of 1 literal when trouble happens, • by SC rule i ’th clause also consists of 1 literal when its literals is assigned, • with probability 1 – 1/n 3 the waiting time for both clauses is O(lg n). If a 1 , a 2 , … is the sequence of literals that would be set to true (if clauses i and j didn’t exist), then 4 of the literals in these two clauses are the negation of the literals in a t , a t+1 , …, a t’ for t’ = t + O( lg n). This happens with probability O((ln 4 n)/n 4 ) times # choices for t.
Since there are O(n 2 ) pairs of clauses, the algorithm fails with probability o(1) by union bound.
Nonuniform models of random graphs
Nonuniform models • Fix a degree distribution: there is f(d) vertices of degree d • Choose a random graph among all graphs with this degree distribution • Edges are no longer independent
Degree distribution: vertex perspective vs edge perspective • Consider a graph where half of vertices have degree 1, half have degree 2 • A random vertex is equally likely of degree 1 or 2 • A random vertex of a random edge is twice more likely to be of degree 2 • In many algorithms, we traverse a random edge to reach an endpoint: the probability of reaching a vertex of degree i is then proportional to i λ i , where λ i is the fraction of vertices of degree i
Giant component in random graphs with given degree distribution
[Molloy, Reed] There will be a giant component iff • Intuition: Consider BFS (branching process) from a fixed vertex. • After the first level, a vertex of degree i has exactly i – 1 children. • The branching process has probability of extinction < 1 iff the expected # children E[i – 1] ≥ 1 , or in other words E[i – 2] >= 0 . • In calculating the expectation, the probability of degree i is from the edge perspective (and not the vertex perspective). Thus it is proportional to i λ i .
Example: G(n, p=1/n)
Poisson degree distribution If vertices have Poisson degree distribution with mean d , then random endpoint of a random edge has degree distribution 1 + Poisson(d).
Growth model without preferential attachment
Growing graphs • Vertices and edges are added over time. With preferential attachment • Preferential attachment = selecting endpoints for a new edge with probability proportional to degrees • Without preferential attachment = selecting endpoints for a new edge uniformly at random from the set of existing vertices
Basic growth model without preferential attachment • Start with zero vertices and zero edges new edge • At each time t, add a new vertex • With probability δ , join two random vertices by an edge new vertex The resulting graph may become a multigraph. But since there are t 2 pairs of vertices and O(t) existing edges, a multiple edge or self-loop happens at each step with small probability, and we ignore these cases.
# vertices of each degree Let d k (t) be expected # vertices of degree k at time t. new edge new vertex
degree distribution Let d k (t) = p k t in the limit as t tends to infinity. Geometric distribution which like the Poisson Erdos-Renyi distribution falls off exponentially fast, unlike preferential attachment power-law.
# components of each finite size Let n k (t) be expected # components of size k at time t Components of size 4 and 2 • A randomly picked component is of size k with probability proportional to n k (t) • A randomly picked vertex is in a component of size k with probability equal to k n k (t)
j vertices Recurrence relation for n k (t) k – j vertices • We use expectations instead of actual # of components of each size?! • We ignore edges falling inside components since we are interested in small component sizes.
Recurrence relation for a k =n k (t) / t j vertices k – j vertices
Phase transition for non-finite components
Size of non-finite components below critical threshold
Summary of phase transition
Comparison with static random graph having degree distribution • Could you explain why giant components appear for smaller δ in the grown model?
Why is δ = 1/4 the threshold for static model?
Growth model with preferential attachment
Description of the model • Begin with empty graph • At each time, add a new vertex and with probability δ , attach the new vertex to a vertex selected at random with probability proportional to its degree Obviously the graph has no cycles.
Degree of vertex i at time t Let d i (t) be the degree of vertex i at time t Thus d i (t) = a t 1/2 . Since d i (i) = δ , we have d i (t) = δ (t/i) 1/2 .
Power-law degree distribution Vertex number t δ 2 /d 2 has degree d. Therefore, # of vertices of degree d is In other words, probability of degree d is 2 δ 2 /d 3 .
Small world graphs
Milgram’s experiment • Ask one in Nebraska to send a letter to one in Massachusetts with given address and occupation • At each step, send to someone you know on a “first name” basis who is closer • In successful experiments, it took 5 or 6 steps • Called “six degrees of separation”
The Kleinberg model for random graphs • n × n grid with local and global edges • From each vertex u, there is a long-distance edge to a vertex v • Vertex v is chosen with probability proportional to d(u,v) -r where distance is Manhattan distance.
Normalization factor • Let . • # nodes of distance k from u is at most 4k. • # nodes of distance k from u is at least k for k ≤ n/2. • We have • c r (u) = Θ (1) when r > 2. • c r (u) = Θ (lg n) when r = 2. • c r (u) = Ω (n 2-r ) when r < 2.
No short (polylogarithmic) paths exist when r > 2. • Expected # of edges connecting vertices of distance ≥ d * is • Thus, with high probability there is no edge connecting vertices at distance at least d * for some d * = n 1- Ω (1) . • Since many pairs of vertices are at distance Ω (n) from each other, the shortest path between these pairs is at least n Ω (1) . A pair of vertices with distance Ω (n)
Recommend
More recommend