Sampling & Counting for Big Data 2019 - PowerPoint PPT Presentation

Sampling & Counting for Big Data �� 2019 �� 2019 � 8 � 3 ��

Sampling vs Counting for all self-reducible problems [Jerrum-Valiant-Vazirani ’86]: approx counting exact ( { approx } sampling vol( Ω ) Poly-Time approx inference X = ( X 1 , X 2 , …, X n ) Turing Machine X ∼ Ω Pr[ X i = ⋅ ∣ X S = σ ]

MCMC Sampling Markov chain for sampling X = ( X 1 , X 2 , …, X n ) ∼ μ • Gibbs sampling (Glauber dynamics, heat-bath) [Glauber, ’63] pick a random i ; [Geman, Geman, ’84] resample X i ~ µ v ( · | N ( v )) ; • Metropolis-Hastings algorithm pick a random i ; [Metropolis et al, ’53] propose a random c ; [Hastings, ’84] X i = c w.p. ∝ µ ( X’ )/ µ ( X ); • Analysis: coupling methods [Aldous , ’83] [Jerrum, ’95] [Bubley, Dyer ’97] may give O( n log n ) upper bound for mixing time

Computational Phase Transition hardcore model: graph G ( V , E ) , max-degree Δ , fugacity λ >0 approx sample independent set I in G w.p. ∝ λ | I | • λ c ( ∆ ) = ( ∆ − 1) ( ∆ − 1) [Weitz, STOC ’06] : If λ < λ c , n O(log Δ ) time. ( ∆ − 2) ∆ • [Sly, FOCS ’10 best paper] : If λ > λ c , λ 6 NP- hard even for Δ =O(1) . 5 Hard 4 3 [Efthymiou, Hayes, Š tefankovi č , 2 Easy Vigoda, Y., FOCS ’16]: 1 If λ < λ c , O( n log n ) mixing time. 2 4 6 8 10 max-deg Δ If Δ is large enough, and there is no small cycle. A phase transition occurs at λ c .

Big Data?

Sampling and Inference for Big Data • Sampling from a joint distribution (specified by a probabilistic graphical model ). • Inferring according to a probabilistic graphical model. • The data ( probabilistic graphical model ) is BIG.

• Parallel/distributed algorithms for sampling ? ✓ • PTIME ⟹ Polylog( n ) rounds • For parallel/distributed computing: ✓ sampling ≡ approx counting/inference ? • PTIME ⟹ Polylog( n ) rounds • Dynamic sampling algorithms ? ✓ • PTIME ⟹ Polylog( n ) incremental cost

Local Computation “ What can be computed locally? ” [Noar, Stockmeyer, STOC’93, SICOMP’95] the LOCAL model [Linial ’87] : • Communications are synchronized. • In each round: unlimited local computation and communication with neighbors. • Complexity: # of rounds to terminate in the worst case. • In t rounds: each node can collect information up to distance t . PLOCAL: t = polylog( n )

“What can be sampled locally?” • Joint distribution defined by local constraints: • Markov random field • Graphical model • Sample a random solution from the joint distribution: • distributed algorithms (in the LOCAL model) network G ( V , E ) Q: “What locally definable joint distributions are locally sample - able?”

MCMC Sampling Classic MCMC sampling: G ( V , E ): Markov chain X t → X t+ 1 : pick a uniform random vertex v ; v v update X ( v ) conditioning on X ( N ( v )) ; O( n log n ) time when mixing Parallelization: • Chromatic scheduler [folklore] [Gonzalez et al. , AISTAT’11] : Vertices in the same color class are updated in parallel. • O( Δ log n ) mixing time ( Δ is max degree) • “Hogwild!” [Niu, Recht, Ré, Wright, NIPS’11][De Sa, Olukotun, Ré, ICML’16] : All vertices are updated in parallel, ignoring concurrency issues. • Wrong distribution!

Crossing the Chromatic # Barrier Sequential Parallel O( n log n ) O( Δ log n ) parallel speedup = θ ( n / Δ ) ∆ = max-degree χ = chromatic no. Do not update adjacent vertices simultaneously. It takes ≥ χ steps to update all vertices at least once. Q: “How to update all variables simultaneously and still converge to the correct distribution?”

Markov Random Fields (MRF) μ ( σ ) ∝ ∏ ∀ σ ∈ [ q ] V : ∏ ν v ( σ v ) ϕ e ( σ u , σ v ) v ∈ V e =( u , v ) ∈ E • Each vertex v ∈ V : a variable over X v ∈ [ q ] ν v domain [ q ] with distribution ϕ e v u ν v • Each edge e =( u , v ) ∈ E : a symmetric binary constraint: ϕ e : [ q ] × [ q ] → [0,1] G ( V , E )

The Local-Metropolis Algorithm [Feng, Sun, Y., What can be sample locally ? PODC ’17] proposals: σ w σ u σ v u v w current: X u X v X w Markov chain X t → X t+ 1 : each vertex v ∈ V independently proposes a random ; σ v ∼ ν v each edge e =( u , v ) passes its check independently with prob: ϕ e ( X u , σ v ) ⋅ ϕ e ( σ u , X v ) ⋅ ϕ e ( σ u , σ v ); each vertex v ∈ V update X v to σ v if all its edges pass checks ; • Local-Metropolis converges to the correct distribution µ .

The Local-Metropolis Algorithm [Feng, Sun, Y., What can be sample locally ? PODC ’17] each vertex v ∈ V independently proposes a random ; σ v ∼ ν v each edge e =( u , v ) passes its check independently with prob: ϕ e ( X u , σ v ) ⋅ ϕ e ( σ u , X v ) ⋅ ϕ e ( σ u , σ v ); each vertex v ∈ V update X v to σ v if all its edges pass checks ; • Local-Metropolis converges to the correct distribution µ . μ ( σ ) ∝ ∏ MRF: ∏ ν v ( σ v ) ϕ e ( σ u , σ v ) v ∈ V e =( u , v ) ∈ E • under coupling condition for Metropolis-Hastings : • Metropolis-Hastings : O( n log n ) time • (lazy) Local-Metropolis : O(log n ) time

Lower Bounds [Feng, Sun, Y., What can be sample locally ? PODC ’17] Approx sampling from any MRF requires Ω (log n ) rounds. • for sampling: O(log n ) is the new criteria of “ local ” If λ > λ c , sampling from hardcore model requires Ω ( diam ) rounds. λ c ( ∆ ) = ( ∆ − 1) ( ∆ − 1) strong separation : sampling vs other ( ∆ − 2) ∆ local computation tasks λ 6 • Independent set is trivial to 5 construct locally (e.g. ∅ ). Hard 4 3 • The lower bound holds not because 2 Easy of the locality of information, but 1 because of the locality of correlation. 2 4 6 8 10 max-deg Δ

• Parallel/distributed algorithms for sampling ? ✓ • PTIME ⟹ Polylog( n ) rounds • For parallel/distributed computing: ✓ sampling ≡ approx counting/inference ? • PTIME ⟹ Polylog( n ) rounds • Dynamic sampling algorithms ? ✓ • PTIME ⟹ Polylog( n ) incremental cost

Example : Sample Independent Set (hardcore model) µ : distribution of independent sets I in G ∝ λ | I | • Y ∈ {0,1} V indicates an independent set • Each v ∈ V returns a Y v ∈ {0,1} , such that Y = ( Y v ) v ∈ V ∼ µ • Or: d TV ( Y , µ ) < 1/poly( n ) network G ( V , E )

Inference (Local Counting) µ : distribution of independent sets I in G ∝ λ | I | : marginal distribution at v conditioning on σ ∈ {0,1} S . µ σ v ∀ y ∈ { 0 , 1 } : v ( y ) = Pr Y ∼ µ [ Y v = y | Y S = σ ] µ σ 0 • Each v ∈ S receives σ v as input. • Each v ∈ V returns a marginal 0 distribution such that: µ σ ˆ v 1 1 1 d TV (ˆ v ) ≤ µ σ v , µ σ poly( n ) n 1 Y Z = µ ( ∅ ) = Y ∼ µ [ Y v i = 0 | ∀ j < i : Y v j = 0] Pr network G ( V , E ) i =1 Z : partition function (counting)

Decay of Correlation : marginal distribution at v conditioning on σ ∈ {0,1} S . µ σ v strong spatial mixing (SSM): ∀ boundary condition B ∈ {0,1} r -sphere( v ) : v , µ σ ,B d TV ( µ σ ) ≤ poly( n ) · exp( − Ω ( r )) v SSM (iff λ ≤ λ c when µ is the G hardcore model) r approx. inference is solvable v B in O(log n ) rounds σ in the LOCAL model

Locality of Counting & Sampling [Feng, Y., PODC ’18] For all self-reducible graphical models: Inference: Sampling: Correlation Decay: local approx. local approx. SSM inference sampling easy with additive error O(log 2 n ) factor local approx. local exact inference sampling with multiplicative error distributed Las Vegas sampler

Locality of Sampling Inference: Sampling: Correlation Decay: local approx. local approx. SSM inference sampling return a random Y = ( Y v ) v ∈ V each v can compute a µ σ ˆ v within O(log n ) -ball whose distribution ˆ µ ≈ µ 1 s.t. 1 d TV (ˆ µ, µ ) ≤ d TV (ˆ v ) ≤ µ σ v , µ σ poly( n ) poly( n ) sequential O(log n ) -local procedure: • scan vertices in V in an arbitrary order v 1 , v 2 , …, v n • for i =1,2, …, n : sample according to Y v 1 ,...,Y vi − 1 Y v i ˆ µ v i

Network Decomposition ( C , D ) -network-decomposition of G : • classifies vertices into clusters; • assign each cluster a color in [ C ] ; • each cluster has diameter ≤ D ; • clusters are properly colored. ( C , D ) r -ND: ( C , D ) -ND of G r Given a ( C , D ) r - ND: sequential r -local procedure: r = O(log n ) r = O(log n ) • scan vertices in V in an arbitrary order v 1 , v 2 , …, v n • for i =1,2, …, n : sample according to Y v 1 ,...,Y vi − 1 Y v i ˆ µ v i can be simulated in O( CDr ) rounds in LOCAL model

Network Decomposition ( C , D ) -network-decomposition of G : • classifies vertices into clusters; • assign each cluster a color in [ C ] ; • each cluster has diameter ≤ D ; • clusters are properly colored. ( C , D ) r -ND: ( C , D ) -ND of G r ( O(log n ), O(log n )) r -ND can be constructed in O( r log 2 n ) rounds w.h.p. [Ghaffari, Kuhn, Maus, STOC’17]: r -local SLOCAL algorithm: O( r log 2 n ) -round LOCAL alg.: ND ∀ ordering π =( v 1 , v 2 , …, v n ) , returns w.h.p. the Y ( π ) for some ordering π returns random vector Y ( π )

Sampling & Counting for Big Data 2019 - PowerPoint PPT Presentation

Sampling & Counting for Big Data 2019 2019 8 3 Sampling vs Counting for all self-reducible problems [Jerrum-Valiant-Vazirani 86]: approx

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

On Local Distributed Sampling and Counting Yitong Yin Nanjing University Joint work with W

Counting is Hard: Probabilistically Counting Views at Reddit Krishnan Chandra, Data Engineer

Approximate Counting By Sampling CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 3 :

44 Days And Counting 44 Days And Counting 2010 World Equestrian Games Overview September 25

Counting Basic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 of 1 10/02/2003 04:00 PM 1

Counting CS1200, CSE IIT Madras Meghana Nasre April 2, 2020 CS1200, CSE IIT Madras Meghana

Counting CS1200, CSE IIT Madras Meghana Nasre March 26, 2020 CS1200, CSE IIT Madras Meghana

Counting and Probability Whats to come? Counting and Probability Whats to come?

Statistical Inference in Gaussian Graphical Models Y. Baraud (1) , C. Giraud (1 , 2) , S. Huet (2)

Layer-finding in Radar Echograms using Probabilis8c Graphical

Graphical Languages for Modeling Complex Reactive Systems Or: Three proposals to argue with . . .

Probabilistic Graphical Models Lecture 1 Introduction CS/CNS/EE 155 Andreas Krause One of

Collaborative Deep Learning and Its Variants for Recommender Systems Hao Wang Joint work with

Inferring Sparse Gaussian Graphical Models for Biological Network Christophe Ambroise Camille

Asymptotics of the Coefficients of Bivariate Analytic Functions with Algebraic Singularities

Thermodynamic and Transport Proper2es of Strongly-Coupled Degenerate