Gibbs Sampling from π -Determinantal Point Processes Alireza Rezaei University of Washington Based on joint work with Shayan Oveis Gharan
Point Process: A distribution on subsets of π = {1,2, β¦ , π} . Determinantal Point Process: There is a PSD kernel π β β πΓπ such that βπ β π : β π β det π π π΄ π»
Point Process: A distribution on subsets of π = {1,2, β¦ , π} . Determinantal Point Process: There is a PSD kernel π β β πΓπ such that βπ β π : β π β det π π π -DPP: Conditioning of a DPP on picking subsets of size π π΄ π» if π = π: β π β det π π Focus of the talk: Sampling from π - otherwise : β π = 0 DPPs
Point Process: A distribution on subsets of π = {1,2, β¦ , π} . Determinantal Point Process: There is a PSD kernel π β β πΓπ such that βπ β π : β π β det π π π -DPP: Conditioning of a DPP on picking subsets of size π π΄ π» if π = π: β π β det π π Focus of the talk: Sampling from π - otherwise : β π = 0 DPPs DPPs are Very popular probabilistic models in machine learning to capture diversity. Applications [Kulesza- Taskarβ11, Dangβ05, Nenkova-Vanderwende- McKeownβ06, Mirzasoleiman-Jegelka- Krauseβ17 ] β Image search, document and video summarization, tweet timeline generation, pose estimation, feature selection
Continuous Domain Input: PSD operator π: π Γ π β β and π select a subset π β π with π points from a distribution with PDF function π(π) β det π(π¦, π§) π¦,π§βπ
Continuous Domain Input: PSD operator π: π Γ π β β and π select a subset π β π with π points from a distribution with PDF function π¦βπ§ Ξ£ β1 π¦βπ§ Ex. Gaussian : π π¦, π§ = exp β π(π) β det π(π¦, π§) π¦,π§βπ 2 Applications. β Hyper-parameter tuning [Dodge-Jamieson- Smithβ17] β Learning mixture of Gaussians [Affandi-Fox- Taskarβ13]
Random Scan Gibbs Sampler for πΏ -DPP 1 1. Stay at the current state π = {π¦ 1 , β¦ π¦ π } with prob 2 . y 2. Choose π¦ π β π u.a.r π¦ π 3. Choose π§ β π from the conditional dist π . π β π¦ π is chosen) [π] Continuous: PDF π§ β π π¦ 1 , β¦ π¦ πβ1 , π§, π¦ π+1 , β¦ , π¦ π ) S β π
Main Result Given a π -DPP π , an βapproximateβ sample from π can be generated by running the π π π· π π β π¦π©π‘ (π°ππ¬ π Gibbs sampler for π = ΰ·© π π ) steps where π is the starting dist.
Main Result Given a π -DPP π , an βapproximateβ sample from π can be generated by running the π π π· π π β π¦π©π‘ (π°ππ¬ π Gibbs sampler for π = ΰ·© π π ) steps where π is the starting dist. Discrete: A simple greedy initialization gives π = π π 5 log π . Total running time is π π . poly π . ο Does not improve upon the previous MCMC methods . [Anari-Oveis Gharan- Rβ16 ] ο Mixing time is independent of π, so the running time in distributed settings is sublinear.
Main Result Given a π -DPP π , an βapproximateβ sample from π can be generated by running the π π π· π π β π¦π©π‘ (π°ππ¬ π Gibbs sampler for π = ΰ·© π π ) steps where π is the starting dist. Discrete: A simple greedy initialization gives π = π π 5 log π . Total running time is π π . poly π . ο Does not improve upon the previous MCMC methods . [Anari-Oveis Gharan- Rβ16 ] ο Mixing time is independent of π, so the running time in distributed settings is sublinear. Being able to run the chain. Continuous: Given access to conditional oracles, π can be found so π = π(π 5 log π) . ο First algorithm with a theoretical guarantee for sampling from continuous π -DPP.
Main Result Given a π -DPP π , an βapproximateβ sample from π can be generated by running the π π π· π π β π¦π©π‘ (π°ππ¬ π Gibbs sampler for π = ΰ·© π π ) steps where π is the starting dist. Discrete: A simple greedy initialization gives π = π π 5 log π . Total running time is π π . poly π . ο Does not improve upon the previous MCMC methods . [Anari-Oveis Gharan- Rβ16 ] ο Mixing time is independent of π, so the running time in distributed settings is sublinear. Being able to run the chain. Continuous: Given access to conditional oracles, π can be found so π = π(π 5 log π) . ο First algorithm with a theoretical guarantee for sampling from continuous π -DPP. π¦βπ§ 2 ο Using a rejection sampler as the conditional oracles for Gaussian kernels π π¦, π§ = exp(β ) π 2 defined a unit sphere in β π , the total running time is β’ If π = poly(d): poly(π, π) If π β€ π π1βπ and π = π 1 : poly π β π π( 1 π ) β’
Recommend
More recommend