What can be sampled loca ! y ? Yitong Yin Nanjing University Joint work with: W eiming Feng, Y uxin Sun
Local Computation “ What can be computed locally? ” [Noar, Stockmeyer, STOC’93, SICOMP’95] the LOCAL model [Linial ’87] : • Communications are synchronized. • In each round: each node can send messages of unbounded sizes to all its neighbors. • Local computations are free. • Complexity: # of rounds to terminate in the worst case. • In t rounds: each node can collect information up to distance t .
Local Computation the LOCAL model [Linial ’87] : • In t rounds: each node can collect information up to distance t . Locally Checkable Labeling ( LCL ) problems [Noar, Stockmeyer ’93] : • CSPs with local constraints. • Construct a feasible solution: vertex/edge coloring, Lovász local lemma • Find local optimum: MIS, MM • Approximate global optimum: maximum matching, minimum vertex network G ( V , E ) cover, minimum dominating set Q: “What locally definable problems are locally computable?” in O(1) rounds by local constraints or in small number of rounds
“What can be sampled locally?” • CSP with local constraints network G ( V , E ): on the network: • proper q -coloring; • independent set; • Sample a uniform random feasible solution: • distributed algorithms (in the LOCAL model) Q: “What locally definable joint distributions are locally sample - able?”
Markov Random Fields (MRF) • Each vertex corresponds to a network G ( V , E ): variable with finite domain [ q ] . • Each edge e =( u , v ) ∈ E imposes a X v ∈ [ q ] weighted binary constraint: A e v u b v A e : [ q ] 2 → R ≥ 0 • Each vertex v ∈ E imposes a weighted unary constraint: b v : [ q ] → R ≥ 0 • Gibbs distribution µ : ∀ σ ∈ [ q ] V X ∈ [ q ] V follows µ ~ Y Y µ ( σ ) ∝ A e ( σ u , σ v ) b v ( σ v ) e =( u,v ) ∈ E v ∈ V
Markov Random Fields (MRF) • Gibbs distribution µ : ∀ σ ∈ [ q ] V network G ( V , E ): Y Y µ ( σ ) ∝ A e ( σ u , σ v ) b v ( σ v ) e =( u,v ) ∈ E v ∈ V X v ∈ [ q ] • proper q -coloring: A e v u b v 0 1 1 0 . . b v = A e = ... . 1 1 0 • independent set: � � 1 1 1 A e = b v = X ∈ [ q ] V follows µ ~ 1 0 1 • local conflict colorings: [Fraigniaud, Heinrich, Kosowski FOCS’16] A e ∈ { 0 , 1 } q × q , b v ∈ { 0 , 1 } q
A Motivation: Distributed Machine Learning • Data are stored in a distributed system. • Sampling from a probabilistic graphical model (e.g. the Markov random field ) by distributed algorithms.
Glauber Dynamics G ( V , E ): starting from an arbitrary X 0 ∈ [ q ] V transition for X t → X t+ 1 : b v A e v v pick a uniform random vertex v ; resample X ( v ) according to the marginal distribution induced by µ at vertex v conditioning on X t ( N ( v )) ; marginal distribution: MRF: ∀ σ ∈ [ q ] V , b v ( x ) Q u ∈ N ( v ) A ( u,v ) ( X u , x ) Pr[ X v = x | X N ( v ) ] = P y ∈ [ q ] b v ( y ) Q u ∈ N ( v ) A ( u,v ) ( X u , y ) Y Y µ ( σ ) ∝ A e ( σ u , σ v ) b v ( σ v ) stationary distribution: µ e =( u,v ) ∈ E v ∈ V mixing time: � 1 τ mix = max X 0 min t | d TV ( X t , µ ) ≤ 2e
Mixing of Glauber Dynamics influence matrix : { ρ v,u } v,u ∈ V v ρ v,u : max discrepancy (in total variation distance) of marginal distributions at v caused by any pair σ , τ of boundary conditions that differ only at u u contraction of one-step Dobrushin’s condition: optimal coupling in the worst X k ρ k ∞ = max ⇢ v,u 1 � ✏ case w.r.t. Hamming distance v ∈ V u ∈ V Theorem ( Dobrushin ’70; Salas, Sokal ’97 ) : Dobrushin’s τ mix = O ( n log n ) condition for Glauber dynamics q ≥ (2+ ε ) Δ for q -coloring: Dobrushin’s condition Δ = max-degree
Parallelization Glauber dynamics: G ( V , E ): starting from an arbitrary X 0 ∈ [ q ] V transition for X t → X t+ 1 : v v pick a uniform random vertex v ; resample X ( v ) according to the marginal distribution induced by µ at vertex v conditioning on X t ( N ( v )) ; Parallelization: • Chromatic scheduler [folklore] [Gonzalez et al. , AISTAT’11] : Vertices in the same color class are updated in parallel. • “Hogwild!” [Niu, Recht, Ré, Wright, NIPS’11][De Sa, Olukotun, Ré, ICML’16] : All vertices are updated in parallel, ignoring concurrency issues.
Warm-up: When Luby meets Glauber starting from an arbitrary X 0 ∈ [ q ] V G ( V , E ): at each step, for each vertex v ∈ V : independently sample a random number β v ∈ [0,1] ; Luby step if β v is locally maximum among its neighborhood N ( v ) : resample X ( v ) according to the Glauber marginal distribution induced by µ at step vertex v conditioning on X t ( N ( v )) ; • Luby step: Independently sample a random independent set. • Glauber step: For independent set vertices, update correctly according to the current marginal distributions. • Stationary distribution: the Gibbs distribution µ.
Mixing of LubyGlauber influence matrix { ρ v,u } v,u ∈ V v Dobrushin’s condition: X k ρ k ∞ = max ⇢ v,u 1 � ✏ u v ∈ V u ∈ V Theorem ( Dobrushin ’70; Salas, Sokal ’97 ) : Dobrushin’s τ mix = O ( n log n ) condition for Glauber dynamics Dobrushin’s τ mix = O ( ∆ log n ) condition for the LubyGlauber chain
influence matrix { ρ v,u } v,u ∈ V v Dobrushin’s condition: X k ρ k ∞ = max ⇢ v,u 1 � ✏ u v ∈ V u ∈ V Dobrushin’s τ mix = O ( ∆ log n ) condition for the LubyGlauber chain Proof (similar to [Hayes’04] [Dyer-Goldberg-Jerrum’06] ) : in the one-step optimal coupling ( X t , Y t ) , let p ( t ) = Pr[ X t ( v ) 6 = Y t ( v )] v p ( t +1) ≤ M p ( t ) where M = ( I − D ) + D ρ Pr[ X t 6 = Y t ] k p ( t ) k 1 D is diagonal and n k p ( t ) k ∞ ∞ k p (0) k ∞ D v,v = Pr[ v is picked in Luby step] n k M k t 1 ◆ t ✓ ≥ ✏ deg( v ) + 1 n 1 � ∆ + 1
Crossing the Chromatic # Barrier Glauber LubyGlauber O( n log n ) O( Δ log n ) parallel speedup = θ ( n / Δ ) ∆ = max-degree χ = chromatic no. Do not update adjacent vertices simultaneously. It takes ≥ χ steps to update all vertices at least once. Q: “How to update all variables simultaneously and still converge to the correct distribution?”
The LocalMetropolis Chain proposals: σ w σ u σ v u v w current: X u X v X w starting from an arbitrary X ∈ [ q ] V , at each step: each vertex v ∈ V independently proposes a random a collective σ v ∈ [ q ] with probability ; b v ( σ v ) / P i ∈ [ q ] b v ( i ) coin flipping each edge e =( u , v ) passes its check independently made between with prob. ; u and v i,j ∈ [ q ] ( A e ( i, j )) 3 A e ( X u , σ v ) A e ( σ u , X v ) A e ( σ u , σ v ) / max each vertex v ∈ V accepts its proposal and update X v to σ v if all incident edges pass their checks ; • [Feng, Sun, Y. ’17] : the LocalMetropolis chain is time-reversible w.r.t. the MRF Gibbs distribution µ .
Detailed Balance Equation: ∀ X, Y ∈ [ q ] V , µ ( X ) P ( X, Y ) = µ ( Y ) P ( Y, X ) σ ∈ [ q ] V : the proposals of all vertices C ∈ { 0 , 1 } E : indicates whether each edge e ∈ E passes its check Ω X → Y , { ( σ , C ) | X → Y when the random choice is ( σ , C ) } P ( σ , C ) ∈ Ω X → Y Pr( σ )Pr( C | σ , X ) = µ ( Y ) P ( X, Y ) P ( Y, X ) = µ ( X ) P ( σ , C ) ∈ Ω Y → X Pr( σ )Pr( C | σ , Y ) Bijection is constructed as: φ X,Y : Ω X → Y → Ω Y → X C = C 0 ⇢ φ X,Y s.t. ! ( σ 0 , C 0 ) ( σ , C ) if for all e incident with v , then σ 0 7� C e = 1 v = X v otherwise σ 0 v = σ v Pr( σ )Pr( C | σ , X ) b v ( Y v ) A e ( X u , X v ) = µ ( Y ) A e ( Y u , Y v ) Y Y Pr( σ 0 )Pr( C 0 | σ 0 , Y ) = b v ( X v ) µ ( X ) v 2 V e = uv 2 E
The LocalMetropolis Chain proposals: σ w σ u σ v u v w current: X u X v X w starting from an arbitrary X ∈ [ q ] V , at each step: each vertex v ∈ V independently proposes a random a collective σ v ∈ [ q ] with probability ; b v ( σ v ) / P i ∈ [ q ] b v ( i ) coin flipping each edge e =( u , v ) passes its check independently made between with prob. ; u and v i,j ∈ [ q ] ( A e ( i, j )) 3 A e ( X u , σ v ) A e ( σ u , X v ) A e ( σ u , σ v ) / max each vertex v ∈ V accepts its proposal and update X v to σ v if all incident edges pass their checks ; • [Feng, Sun, Y. ’17] : the LocalMetropolis chain is time-reversible w.r.t. the MRF Gibbs distribution µ .
LocalMetropolis for Hardcore model the hardcore model on G ( V , E ) with fugacity λ : λ | I | ∀ independent set I in G : µ ( I ) = I : IS in G λ | I | P starting from an arbitrary X ∈ {0,1} V , with 1 indicating occupied at each step, each vertex v ∈ V : proposes a random σ v ∈ {0,1} independently ( λ 1 with probability 1+ λ , σ v = 1 0 with probability 1+ λ ; accepts the proposal and update X v to σ v unless for some neighbor u of v : X u = σ v =1 or σ u =X v =1 or σ u = σ v =1 ; • λ < 1/ Δ : τ mix = O(log n ), even for unbounded Δ .
Recommend
More recommend