Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling - PowerPoint PPT Presentation

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling Christopher De Sa Kunle Olukotun Christopher Ré {cdesa,kunle,chrismre}@stanford.edu Stanford 1

Overview

Asynchronous Gibbs sampling is a popular algorithm that’s used in practical ML systems. …etc. Zhang et al, PVLDB 2014 Smola et al, PVLDB 2010

Asynchronous Gibbs sampling is a popular algorithm that’s used in practical ML systems. Question: when and why does it work?

Asynchronous Gibbs sampling is a popular algorithm that’s used in practical ML systems. Question: when and why does it work? “Folklore” says that asynchronous Gibbs sampling basically works whenever standard (sequential) Gibbs sampling does …but there’s no theoretical guarantee .

Asynchronous Gibbs sampling is a popular algorithm that’s used in practical ML systems. Question: when and why does it work? “Folklore” says that asynchronous Gibbs sampling basically works whenever standard (sequential) Gibbs sampling does …but there’s no theoretical guarantee . Our contributions 1. The “folklore” is not necesarily true. 2. ...but it works under reasonable conditions .

Problem: given a probability distribution , produce samples from it. • e.g. to do inference in a graphical model 10

Problem: given a probability distribution , produce samples from it. • e.g. to do inference in a graphical model Algorithm: Gibbs sampling • de facto Markov chain Monte Carlo ( MCMC ) method for inference • produces a series of approximate samples that approach the target distribution 11

What is Gibbs Sampling? 12

What is Gibbs Sampling? Algorithm 1 Gibbs sampling Require: Variables x i for 1 ≤ i ≤ n , and distribution π . loop Choose s by sampling uniformly from { 1 , . . . , n } . Re-sample x s uniformly from P π ( x s | x { 1 ,...,n }\{ s } ). output x end loop 13

What is Gibbs Sampling? Algorithm 1 Gibbs sampling Require: Variables x i for 1 ≤ i ≤ n , and distribution π . loop Choose s by sampling uniformly from { 1 , . . . , n } . Re-sample x s uniformly from P π ( x s | x { 1 ,...,n }\{ s } ). output x end loop x 2 x 7 x 5 x 4 x 1 x 6 x 3 14

What is Gibbs Sampling? Algorithm 1 Gibbs sampling Choose a variable to Require: Variables x i for 1 ≤ i ≤ n , and distribution π . update at random. loop Choose s by sampling uniformly from { 1 , . . . , n } . Re-sample x s uniformly from P π ( x s | x { 1 ,...,n }\{ s } ). output x end loop x 2 x 7 x 5 x 5 x 4 x 1 x 6 x 3 15

What is Gibbs Sampling? Algorithm 1 Gibbs sampling Compute its conditional Require: Variables x i for 1 ≤ i ≤ n , and distribution π . distribution given the loop other variables. Choose s by sampling uniformly from { 1 , . . . , n } . Re-sample x s uniformly from P π ( x s | x { 1 ,...,n }\{ s } ). output x end loop x 2 x 7 x 7 x 5 x 5 x 4 x 4 P ( ) = 0.7 x 5 x 1 x 6 x 6 x 3 P ( ) = 0.3 x 5 16

What is Gibbs Sampling? Algorithm 1 Gibbs sampling Compute its conditional Require: Variables x i for 1 ≤ i ≤ n , and distribution π . distribution given the loop other variables. Choose s by sampling uniformly from { 1 , . . . , n } . Re-sample x s uniformly from P π ( x s | x { 1 ,...,n }\{ s } ). output x Update the variable by end loop sampling from its conditional distribution. x 2 x 7 x 7 x 5 x 5 x 5 x 4 x 4 P ( ) = 0.7 x 5 x 1 x 6 x 6 x 3 P ( ) = 0.3 x 5 17

What is Gibbs Sampling? Algorithm 1 Gibbs sampling Require: Variables x i for 1 ≤ i ≤ n , and distribution π . loop Output the current Choose s by sampling uniformly from { 1 , . . . , n } . state as a sample. Re-sample x s uniformly from P π ( x s | x { 1 ,...,n }\{ s } ). output x end loop x 2 x 7 x 5 x 5 x 5 x 4 x 1 x 6 x 3 18

Gibbs Sampling: A Practical Perspective 19

Gibbs Sampling: A Practical Perspective • Pros of Gibbs sampling – Easy to implement – Updates are sparse à fast on modern CPUs • Cons of Gibbs sampling – sequential algorithm à can’t naively parallelize 20

Gibbs Sampling: A Practical Perspective • Pros of Gibbs sampling – Easy to implement – Updates are sparse à fast on modern CPUs • Cons of Gibbs sampling – sequential algorithm à can’t naively parallelize Leave up to 98% e.g. of performance No parallelism on the table! 64 core 21

Asynchronous Gibbs Sampling 22

Asynchronous Gibbs Sampling • Run multiple threads in parallel without locks – also known as H OGWILD ! – adapted from a popular technique for stochastic gradient descent (SGD) • When we read a variable, it could be stale – while we re-sample a variable, its adjacent variables can be overwritten by other threads – semantics not equivalent to standard (sequential) Gibbs sampling 23

Asynchronous Gibbs Sampling • Run multiple threads in parallel without locks – also known as H OGWILD ! – adapted from a popular technique for stochastic gradient descent (SGD) • When we read a variable, it could be stale – while we re-sample a variable, its adjacent variables can be overwritten by other threads – semantics not equivalent to standard (sequential) Gibbs sampling 24

Question Does asynchronous Gibbs sampling work? …and what does it mean for it to work? 26

Question Does asynchronous Gibbs sampling work? …and what does it mean for it to work? Two desiderata 27

Question Does asynchronous Gibbs sampling work? …and what does it mean for it to work? Two desiderata want to get accurate estimates ê bound the bias 28

Question Does asynchronous Gibbs sampling work? …and what does it mean for it to work? Two desiderata want to be independent of initial conditions want to get accurate estimates quickly ê ê bound the bound the bias mixing time 29

Previous Work 30

Previous Work • “ Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent ” — Niu et al, NIPS 2011. follow-up work: Liu and Wright SCIOPS 2015, Liu et al JMLR 2015, De Sa et al NIPS 2015, Mania et al arxiv 2015 • “ Analyzing Hogwild Parallel Gaussian Gibbs Sampling ” — Johnson et al, NIPS 2013. 31

Previous Work • “ Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent ” — Niu et al, NIPS 2011. follow-up work: Liu and Wright SCIOPS 2015, Liu et al JMLR 2015, De Sa et al NIPS 2015, Mania et al arxiv 2015 • “ Analyzing Hogwild Parallel Gaussian Gibbs Sampling ” — Johnson et al, NIPS 2013. 32

Question Does asynchronous Gibbs sampling work? …and what does it mean for it to work? Two desiderata want to be independent of initial conditions want to get accurate estimates quickly ê ê bound the bound the bias mixing time 33

Bias 34

Bias • How close are samples to target distribution? – standard measurement: total variation distance k µ � ν k TV = max A ⊂ Ω | µ ( A ) � ν ( A ) | • For sequential Gibbs, no asymptotic bias : 35

Bias • How close are samples to target distribution? – standard measurement: total variation distance k µ � ν k TV = max A ⊂ Ω | µ ( A ) � ν ( A ) | • For sequential Gibbs, no asymptotic bias : t →∞ k P ( t ) µ 0 � π k TV = 0 8 µ 0 , lim 36

Bias • How close are samples to target distribution? – standard measurement: total variation distance k µ � ν k TV = max A ⊂ Ω | µ ( A ) � ν ( A ) | • For sequential Gibbs, no asymptotic bias : t →∞ k P ( t ) µ 0 � π k TV = 0 8 µ 0 , lim “Folklore” : asynchronous Gibbs is also unbiased. …but this is not necessarily true ! 37

Simple Bias Example p (0 , 1) = p (1 , 0) = p (1 , 1) = 1 p (0 , 0) = 0 . 3 1 / 4 (0 , 1) (1 , 1) 3 / 4 1 / 2 1 / 4 1 / 2 1 / 4 1 / 4 (0 , 0) (1 , 0) 3 / 4 1 / 2 38

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling - PowerPoint PPT Presentation

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling Christopher De Sa Kunle Olukotun Christopher R {cdesa,kunle,chrismre}@stanford.edu Stanford 1 Overview Asynchronous Gibbs sampling is a popular algorithm thats used in

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Energy-Efficient Mixing Solutions The power of innovation BioMix TM Compressed Gas Mixing

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State)

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

Rapid Response Jobs are Alaskas Future Rapid Response Rapid Response Rapid Response is a

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Math 211 Math 211 Lecture #7 Mixing Problems September 10, 2003 2 Mixing Problem #1 Mixing

Factors of Gibbs measures on subshifts What is a Gibbs measure? Two-ish definitions Equivalence

College P Planning N Night GIBBS GIBBS HIGH IGH SCHOOL SC SCHO HOOL COUNSE SELING OFFICE

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Rising Inequality: Trends and Consequences Iglika Ivanova iglika@policyalternatives.ca

Chapter 3: Laws of Motion New Reading Assignment (Ch. 4) to be completed in Canvas due on

A Sybil-Proof Distributed Hash Table Chris Lesniewski-Laas M. Frans Kaashoek MIT 28 April 2010

Re-structuring a giant, ancient code-base or: Making LibreOffice work well everywhere Michael

Collaborators Joint work with Sarah Dean, Aurelia Guy, Horia Mania, Nikolai Matni, Max

Cosmology 101 Modes of thinking in cosmology Old and New Swadesh Mitter Mahajan University of

How Not to Get Old : Avoiding Forced Retirement Tony Pacione, MSW, MAEd Illinois Lawyers

Scaling limits of density functional theory: cross-over from mean field theory to optimal

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling - PowerPoint PPT Presentation

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling Christopher De Sa Kunle Olukotun Christopher R {cdesa,kunle,chrismre}@stanford.edu Stanford 1 Overview Asynchronous Gibbs sampling is a popular algorithm thats used in

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Energy-Efficient Mixing Solutions The power of innovation BioMix TM Compressed Gas Mixing

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State)

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

Rapid Response Jobs are Alaskas Future Rapid Response Rapid Response Rapid Response is a

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Math 211 Math 211 Lecture #7 Mixing Problems September 10, 2003 2 Mixing Problem #1 Mixing

Factors of Gibbs measures on subshifts What is a Gibbs measure? Two-ish definitions Equivalence

College P Planning N Night GIBBS GIBBS HIGH IGH SCHOOL SC SCHO HOOL COUNSE SELING OFFICE

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Rising Inequality: Trends and Consequences Iglika Ivanova iglika@policyalternatives.ca

Chapter 3: Laws of Motion New Reading Assignment (Ch. 4) to be completed in Canvas due on

A Sybil-Proof Distributed Hash Table Chris Lesniewski-Laas M. Frans Kaashoek MIT 28 April 2010

Re-structuring a giant, ancient code-base or: Making LibreOffice work well everywhere Michael

Collaborators Joint work with Sarah Dean, Aurelia Guy, Horia Mania, Nikolai Matni, Max

Cosmology 101 Modes of thinking in cosmology Old and New Swadesh Mitter Mahajan University of

How Not to Get Old : Avoiding Forced Retirement Tony Pacione, MSW, MAEd Illinois Lawyers

Scaling limits of density functional theory: cross-over from mean field theory to optimal

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh