Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei - PowerPoint PPT Presentation

Gibbs Sampling from 𝑙 -Determinantal Point Processes Alireza Rezaei University of Washington Based on joint work with Shayan Oveis Gharan

Point Process: A distribution on subsets of 𝑂 = {1,2, … , 𝑂} . Determinantal Point Process: There is a PSD kernel 𝑀 ∈ ℝ 𝑂×𝑂 such that ∀𝑇 ⊆ 𝑂 : ℙ 𝑇 ∝ det 𝑀 𝑇 𝑴 𝑻

Point Process: A distribution on subsets of 𝑂 = {1,2, … , 𝑂} . Determinantal Point Process: There is a PSD kernel 𝑀 ∈ ℝ 𝑂×𝑂 such that ∀𝑇 ⊆ 𝑂 : ℙ 𝑇 ∝ det 𝑀 𝑇 𝒍 -DPP: Conditioning of a DPP on picking subsets of size 𝑙 𝑴 𝑻 if 𝑇 = 𝑙: ℙ 𝑇 ∝ det 𝑀 𝑇 Focus of the talk: Sampling from 𝑙 - otherwise : ℙ 𝑇 = 0 DPPs

Point Process: A distribution on subsets of 𝑂 = {1,2, … , 𝑂} . Determinantal Point Process: There is a PSD kernel 𝑀 ∈ ℝ 𝑂×𝑂 such that ∀𝑇 ⊆ 𝑂 : ℙ 𝑇 ∝ det 𝑀 𝑇 𝒍 -DPP: Conditioning of a DPP on picking subsets of size 𝑙 𝑴 𝑻 if 𝑇 = 𝑙: ℙ 𝑇 ∝ det 𝑀 𝑇 Focus of the talk: Sampling from 𝑙 - otherwise : ℙ 𝑇 = 0 DPPs DPPs are Very popular probabilistic models in machine learning to capture diversity. Applications [Kulesza- Taskar’11, Dang’05, Nenkova-Vanderwende- McKeown’06, Mirzasoleiman-Jegelka- Krause’17 ] — Image search, document and video summarization, tweet timeline generation, pose estimation, feature selection

Continuous Domain Input: PSD operator 𝑀: 𝒟 × 𝒟 → ℝ and 𝑙 select a subset 𝑇 ⊂ 𝒟 with 𝑙 points from a distribution with PDF function 𝑞(𝑇) ∝ det 𝑀(𝑦, 𝑧) 𝑦,𝑧∈𝑇

Continuous Domain Input: PSD operator 𝑀: 𝒟 × 𝒟 → ℝ and 𝑙 select a subset 𝑇 ⊂ 𝒟 with 𝑙 points from a distribution with PDF function 𝑦−𝑧 Σ −1 𝑦−𝑧 Ex. Gaussian : 𝑀 𝑦, 𝑧 = exp − 𝑞(𝑇) ∝ det 𝑀(𝑦, 𝑧) 𝑦,𝑧∈𝑇 2 Applications. — Hyper-parameter tuning [Dodge-Jamieson- Smith’17] — Learning mixture of Gaussians [Affandi-Fox- Taskar’13]

Random Scan Gibbs Sampler for 𝐿 -DPP 1 1. Stay at the current state 𝑇 = {𝑦 1 , … 𝑦 𝑙 } with prob 2 . y 2. Choose 𝑦 𝑗 ∈ 𝑇 u.a.r 𝑦 𝑗 3. Choose 𝑧 ∉ 𝑇 from the conditional dist 𝜌 . 𝑇 − 𝑦 𝑗 is chosen) [𝑂] Continuous: PDF 𝑧 ∝ 𝜌 𝑦 1 , … 𝑦 𝑗−1 , 𝑧, 𝑦 𝑗+1 , … , 𝑦 𝑙 ) S ∈ 𝑙

Main Result Given a 𝑙 -DPP 𝜌 , an “approximate” sample from 𝜌 can be generated by running the 𝒒 𝝂 𝑷 𝒍 𝟓 ⋅ 𝐦𝐩𝐡 (𝐰𝐛𝐬 𝝆 Gibbs sampler for 𝝊 = ෩ 𝒒 𝝆 ) steps where 𝜈 is the starting dist.

Main Result Given a 𝑙 -DPP 𝜌 , an “approximate” sample from 𝜌 can be generated by running the 𝒒 𝝂 𝑷 𝒍 𝟓 ⋅ 𝐦𝐩𝐡 (𝐰𝐛𝐬 𝝆 Gibbs sampler for 𝝊 = ෩ 𝒒 𝝆 ) steps where 𝜈 is the starting dist. Discrete: A simple greedy initialization gives 𝜐 = 𝑃 𝑙 5 log 𝑙 . Total running time is 𝑃 𝑂 . poly 𝑙 .  Does not improve upon the previous MCMC methods . [Anari-Oveis Gharan- R’16 ]  Mixing time is independent of 𝑂, so the running time in distributed settings is sublinear.

Main Result Given a 𝑙 -DPP 𝜌 , an “approximate” sample from 𝜌 can be generated by running the 𝒒 𝝂 𝑷 𝒍 𝟓 ⋅ 𝐦𝐩𝐡 (𝐰𝐛𝐬 𝝆 Gibbs sampler for 𝝊 = ෩ 𝒒 𝝆 ) steps where 𝜈 is the starting dist. Discrete: A simple greedy initialization gives 𝜐 = 𝑃 𝑙 5 log 𝑙 . Total running time is 𝑃 𝑂 . poly 𝑙 .  Does not improve upon the previous MCMC methods . [Anari-Oveis Gharan- R’16 ]  Mixing time is independent of 𝑂, so the running time in distributed settings is sublinear. Being able to run the chain. Continuous: Given access to conditional oracles, 𝜈 can be found so 𝜐 = 𝑃(𝑙 5 log 𝑙) .  First algorithm with a theoretical guarantee for sampling from continuous 𝑙 -DPP.

Main Result Given a 𝑙 -DPP 𝜌 , an “approximate” sample from 𝜌 can be generated by running the 𝒒 𝝂 𝑷 𝒍 𝟓 ⋅ 𝐦𝐩𝐡 (𝐰𝐛𝐬 𝝆 Gibbs sampler for 𝝊 = ෩ 𝒒 𝝆 ) steps where 𝜈 is the starting dist. Discrete: A simple greedy initialization gives 𝜐 = 𝑃 𝑙 5 log 𝑙 . Total running time is 𝑃 𝑂 . poly 𝑙 .  Does not improve upon the previous MCMC methods . [Anari-Oveis Gharan- R’16 ]  Mixing time is independent of 𝑂, so the running time in distributed settings is sublinear. Being able to run the chain. Continuous: Given access to conditional oracles, 𝜈 can be found so 𝜐 = 𝑃(𝑙 5 log 𝑙) .  First algorithm with a theoretical guarantee for sampling from continuous 𝑙 -DPP. 𝑦−𝑧 2  Using a rejection sampler as the conditional oracles for Gaussian kernels 𝑀 𝑦, 𝑧 = exp(− ) 𝜏 2 defined a unit sphere in ℝ 𝑒 , the total running time is • If 𝑙 = poly(d): poly(𝑒, 𝜏) If 𝑙 ≤ 𝑓 𝑒1−𝜀 and 𝜏 = 𝑃 1 : poly 𝑒 ⋅ 𝑙 𝑃( 1 𝜀 ) •

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei - PowerPoint PPT Presentation

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei University of Washington Based on joint work with Shayan Oveis Gharan Point Process: A distribution on subsets of = {1,2, , } . Determinantal Point Process:

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State)

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees Ruqi Zhang and

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010

Factors of Gibbs measures on subshifts What is a Gibbs measure? Two-ish definitions Equivalence

College P Planning N Night GIBBS GIBBS HIGH IGH SCHOOL SC SCHO HOOL COUNSE SELING OFFICE

Sampling Gibbs : Maximization Expectation Scribes Jered McInerney : 2- hang Xiongyi

Mathematics in the Sciences Eggeling et al. Gibbs sampling for parsMMs with latent variables 1

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling Christopher De Sa Kunle

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

The Gibbs Sampler CSE 527 Lecture 9 Lawrence, et al. Detecting Subtle Sequence

Standards for Knowledge Domain Representation Knowledge domain ontologies Existing

Latent Force Models Neil D. Lawrence (work with Magnus Rattray, Mauricio Alvarez , Pei Gao,

Complexity and optimization of the Gibbs Sampler for multilevel linear models Giacomo Zanella

Advanced Simulation - Lecture 5 George Deligiannidis February 1st, 2016 Irreducibility and

Probabilistic & Unsupervised Learning Sampling Methods Maneesh Sahani

The Software Package A NTS InFields A NTS InFields is a Software Package for Simulation and

Lecture 12 The remaining samples x 1 ,,x S is an approximation of p*(x) Notation D = ( x

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei - PowerPoint PPT Presentation

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei University of Washington Based on joint work with Shayan Oveis Gharan Point Process: A distribution on subsets of = {1,2, , } . Determinantal Point Process:

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State)

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees Ruqi Zhang and

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010

Factors of Gibbs measures on subshifts What is a Gibbs measure? Two-ish definitions Equivalence

College P Planning N Night GIBBS GIBBS HIGH IGH SCHOOL SC SCHO HOOL COUNSE SELING OFFICE

Sampling Gibbs : Maximization Expectation Scribes Jered McInerney : 2- hang Xiongyi

Mathematics in the Sciences Eggeling et al. Gibbs sampling for parsMMs with latent variables 1

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling Christopher De Sa Kunle

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

The Gibbs Sampler CSE 527 Lecture 9 Lawrence, et al. Detecting Subtle Sequence

Standards for Knowledge Domain Representation Knowledge domain ontologies Existing

Latent Force Models Neil D. Lawrence (work with Magnus Rattray, Mauricio Alvarez , Pei Gao,

Complexity and optimization of the Gibbs Sampler for multilevel linear models Giacomo Zanella

Advanced Simulation - Lecture 5 George Deligiannidis February 1st, 2016 Irreducibility and

Probabilistic &amp; Unsupervised Learning Sampling Methods Maneesh Sahani

The Software Package A NTS InFields A NTS InFields is a Software Package for Simulation and

Lecture 12 The remaining samples x 1 ,,x S is an approximation of p*(x) Notation D = ( x

Probabilistic & Unsupervised Learning Sampling Methods Maneesh Sahani