gibbs sampling from
play

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei - PowerPoint PPT Presentation

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei University of Washington Based on joint work with Shayan Oveis Gharan Point Process: A distribution on subsets of = {1,2, , } . Determinantal Point Process:


  1. Gibbs Sampling from 𝑙 -Determinantal Point Processes Alireza Rezaei University of Washington Based on joint work with Shayan Oveis Gharan

  2. Point Process: A distribution on subsets of 𝑂 = {1,2, … , 𝑂} . Determinantal Point Process: There is a PSD kernel 𝑀 ∈ ℝ 𝑂×𝑂 such that βˆ€π‘‡ βŠ† 𝑂 : β„™ 𝑇 ∝ det 𝑀 𝑇 𝑴 𝑻

  3. Point Process: A distribution on subsets of 𝑂 = {1,2, … , 𝑂} . Determinantal Point Process: There is a PSD kernel 𝑀 ∈ ℝ 𝑂×𝑂 such that βˆ€π‘‡ βŠ† 𝑂 : β„™ 𝑇 ∝ det 𝑀 𝑇 𝒍 -DPP: Conditioning of a DPP on picking subsets of size 𝑙 𝑴 𝑻 if 𝑇 = 𝑙: β„™ 𝑇 ∝ det 𝑀 𝑇 Focus of the talk: Sampling from 𝑙 - otherwise : β„™ 𝑇 = 0 DPPs

  4. Point Process: A distribution on subsets of 𝑂 = {1,2, … , 𝑂} . Determinantal Point Process: There is a PSD kernel 𝑀 ∈ ℝ 𝑂×𝑂 such that βˆ€π‘‡ βŠ† 𝑂 : β„™ 𝑇 ∝ det 𝑀 𝑇 𝒍 -DPP: Conditioning of a DPP on picking subsets of size 𝑙 𝑴 𝑻 if 𝑇 = 𝑙: β„™ 𝑇 ∝ det 𝑀 𝑇 Focus of the talk: Sampling from 𝑙 - otherwise : β„™ 𝑇 = 0 DPPs DPPs are Very popular probabilistic models in machine learning to capture diversity. Applications [Kulesza- Taskar’11, Dang’05, Nenkova-Vanderwende- McKeown’06, Mirzasoleiman-Jegelka- Krause’17 ] β€” Image search, document and video summarization, tweet timeline generation, pose estimation, feature selection

  5. Continuous Domain Input: PSD operator 𝑀: π’Ÿ Γ— π’Ÿ β†’ ℝ and 𝑙 select a subset 𝑇 βŠ‚ π’Ÿ with 𝑙 points from a distribution with PDF function π‘ž(𝑇) ∝ det 𝑀(𝑦, 𝑧) 𝑦,π‘§βˆˆπ‘‡

  6. Continuous Domain Input: PSD operator 𝑀: π’Ÿ Γ— π’Ÿ β†’ ℝ and 𝑙 select a subset 𝑇 βŠ‚ π’Ÿ with 𝑙 points from a distribution with PDF function π‘¦βˆ’π‘§ Ξ£ βˆ’1 π‘¦βˆ’π‘§ Ex. Gaussian : 𝑀 𝑦, 𝑧 = exp βˆ’ π‘ž(𝑇) ∝ det 𝑀(𝑦, 𝑧) 𝑦,π‘§βˆˆπ‘‡ 2 Applications. β€” Hyper-parameter tuning [Dodge-Jamieson- Smith’17] β€” Learning mixture of Gaussians [Affandi-Fox- Taskar’13]

  7. Random Scan Gibbs Sampler for 𝐿 -DPP 1 1. Stay at the current state 𝑇 = {𝑦 1 , … 𝑦 𝑙 } with prob 2 . y 2. Choose 𝑦 𝑗 ∈ 𝑇 u.a.r 𝑦 𝑗 3. Choose 𝑧 βˆ‰ 𝑇 from the conditional dist 𝜌 . 𝑇 βˆ’ 𝑦 𝑗 is chosen) [𝑂] Continuous: PDF 𝑧 ∝ 𝜌 𝑦 1 , … 𝑦 π‘—βˆ’1 , 𝑧, 𝑦 𝑗+1 , … , 𝑦 𝑙 ) S ∈ 𝑙

  8. Main Result Given a 𝑙 -DPP 𝜌 , an β€œapproximate” sample from 𝜌 can be generated by running the 𝒒 𝝂 𝑷 𝒍 πŸ“ β‹… 𝐦𝐩𝐑 (𝐰𝐛𝐬 𝝆 Gibbs sampler for 𝝊 = ΰ·© 𝒒 𝝆 ) steps where 𝜈 is the starting dist.

  9. Main Result Given a 𝑙 -DPP 𝜌 , an β€œapproximate” sample from 𝜌 can be generated by running the 𝒒 𝝂 𝑷 𝒍 πŸ“ β‹… 𝐦𝐩𝐑 (𝐰𝐛𝐬 𝝆 Gibbs sampler for 𝝊 = ΰ·© 𝒒 𝝆 ) steps where 𝜈 is the starting dist. Discrete: A simple greedy initialization gives 𝜐 = 𝑃 𝑙 5 log 𝑙 . Total running time is 𝑃 𝑂 . poly 𝑙 . οƒ˜ Does not improve upon the previous MCMC methods . [Anari-Oveis Gharan- R’16 ] οƒ˜ Mixing time is independent of 𝑂, so the running time in distributed settings is sublinear.

  10. Main Result Given a 𝑙 -DPP 𝜌 , an β€œapproximate” sample from 𝜌 can be generated by running the 𝒒 𝝂 𝑷 𝒍 πŸ“ β‹… 𝐦𝐩𝐑 (𝐰𝐛𝐬 𝝆 Gibbs sampler for 𝝊 = ΰ·© 𝒒 𝝆 ) steps where 𝜈 is the starting dist. Discrete: A simple greedy initialization gives 𝜐 = 𝑃 𝑙 5 log 𝑙 . Total running time is 𝑃 𝑂 . poly 𝑙 . οƒ˜ Does not improve upon the previous MCMC methods . [Anari-Oveis Gharan- R’16 ] οƒ˜ Mixing time is independent of 𝑂, so the running time in distributed settings is sublinear. Being able to run the chain. Continuous: Given access to conditional oracles, 𝜈 can be found so 𝜐 = 𝑃(𝑙 5 log 𝑙) . οƒ˜ First algorithm with a theoretical guarantee for sampling from continuous 𝑙 -DPP.

  11. Main Result Given a 𝑙 -DPP 𝜌 , an β€œapproximate” sample from 𝜌 can be generated by running the 𝒒 𝝂 𝑷 𝒍 πŸ“ β‹… 𝐦𝐩𝐑 (𝐰𝐛𝐬 𝝆 Gibbs sampler for 𝝊 = ΰ·© 𝒒 𝝆 ) steps where 𝜈 is the starting dist. Discrete: A simple greedy initialization gives 𝜐 = 𝑃 𝑙 5 log 𝑙 . Total running time is 𝑃 𝑂 . poly 𝑙 . οƒ˜ Does not improve upon the previous MCMC methods . [Anari-Oveis Gharan- R’16 ] οƒ˜ Mixing time is independent of 𝑂, so the running time in distributed settings is sublinear. Being able to run the chain. Continuous: Given access to conditional oracles, 𝜈 can be found so 𝜐 = 𝑃(𝑙 5 log 𝑙) . οƒ˜ First algorithm with a theoretical guarantee for sampling from continuous 𝑙 -DPP. π‘¦βˆ’π‘§ 2 οƒ˜ Using a rejection sampler as the conditional oracles for Gaussian kernels 𝑀 𝑦, 𝑧 = exp(βˆ’ ) 𝜏 2 defined a unit sphere in ℝ 𝑒 , the total running time is β€’ If 𝑙 = poly(d): poly(𝑒, 𝜏) If 𝑙 ≀ 𝑓 𝑒1βˆ’πœ€ and 𝜏 = 𝑃 1 : poly 𝑒 β‹… 𝑙 𝑃( 1 πœ€ ) β€’

Recommend


More recommend