learning determinantal processes wit ith moments and
play

Learning Determinantal Processes wit ith Moments and Cycles J. - PowerPoint PPT Presentation

Learning Determinantal Processes wit ith Moments and Cycles J. Urschel, V.-E. Brunel, A. Moitra, P. Rigollet ICML 2017, Sydney Determinantal Point Processes (D (DPPs) DPP: Random subset of [] For all ,


  1. Learning Determinantal Processes wit ith Moments and Cycles J. Urschel, V.-E. Brunel, A. Moitra, P. Rigollet ICML 2017, Sydney

  2. Determinantal Point Processes (D (DPPs) DPP: Random subset of [𝑢] β€’ For all 𝐾 βŠ† 𝑂 , β„™ 𝐾 βŠ† 𝑍 = det 𝑳 𝐾 𝑳 ∈ ℝ 𝑂×𝑂 , symmetric, 0 β‰Ό 𝐿 β‰Ό 𝐽 𝑂 : parameter ( kernel ) of the DPP β€’ β€’ 𝐿 𝐾 = 𝐿 𝑗,π‘˜ 𝑗,π‘˜βˆˆπΎ 2 ≀ β„™ 1 ∈ 𝑍 β„™ 2 ∈ 𝑍 . β€’ β„™ 1 ∈ 𝑍 = 𝐿 1,1 , β„™ 1,2 ∈ 𝑍 = 𝐿 1,1 𝐿 2,2 βˆ’ 𝐿 1,2 E.g. 𝑀 = 𝐿 𝐽 𝑂 βˆ’ 𝐿 βˆ’1 . β€’ A.k.a. 𝑀 -ensembles if 0 β‰Ί 𝐿 β‰Ί 𝐽 𝑂 : β„™ 𝑍 = 𝐾 ∝ det 𝑀 𝐾 ,

  3. Binary ry representation DPP ⟷ Random binary vector of size 𝑢 , represented as a subset of [𝑢] . ↔ {1,4,5,7,9,10,12,15,19} 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 0 1 0 ↔ {3,4,6,8,9, 12,15,19} π‘Œ 1 , … , π‘Œ 𝑂 ∈ {0,1} 𝑂 ↔ 𝑍 βŠ† [𝑂] π‘Œ 𝑗 = 1 ⇔ 𝑗 ∈ 𝑍 Model for correlated Bernoulli r.v.’s (such as Ising , …) featuring repulsion.

  4. Applications of DPP’s DPPs have become popular in various applications: β€’ Quantum physics ( fermionic processes ) [Macchi β€˜74] β€’ Document and timeline summarization [Lin, Bilmes β€˜12; Yao et al. β€˜16] β€’ Image search [Kulesza, Taskar β€˜11; Affandi et al. β€˜14] β€’ Bioinformatics [Batmanghelich et al. β€˜14] β€’ Neuroscience [Snoek et al. β€˜13] β€’ Wireless or cellular networks modelization [Miyoshi, Shirai β€˜14; Torrisi, Leonardi β€˜14; Li et al. β€˜15; Deng et al. β€˜15] And they remain an elegant and important tool in probability theory [Borodin β€˜11]

  5. Learning DPPs iid β€’ Given 𝑍 1 , 𝑍 2 , … , 𝑍 π‘œ ∼ DPP 𝐿 , estimate 𝐿 . β€’ Approach: Method of moments β€’ Problem: Is 𝐿 identified ?

  6. Identification: 𝓔 -similarity Id β€² = det 𝐿 β€’ DPP 𝐿′ = DPP 𝐿 ⇔ det 𝐿 𝐾 , βˆ€πΎ βŠ† [𝑂] 𝐾 Β±1 0 [Oeding β€˜11] Β±1 ⇔ 𝐿′ = 𝐸𝐿𝐸 for some D = . β‹± 0 Β±1 ↓ ↓ ← + + + + + βˆ’ βˆ’ + + + + + βˆ’ + + βˆ’ β€’ E.g.: K = ⇝ 𝐸𝐿𝐸 = + + + + βˆ’ + + βˆ’ ← + + + + + βˆ’ βˆ’ + β€’ 𝐿 and 𝐸𝐿𝐸 are called 𝓔 -similar .

  7. Method of f moments π‘œ 𝐿 𝑗,𝑗 = 1 β€’ Diagonal entries: 𝑳 𝒋,𝒋 = β„™ 𝑗 ∈ 𝑍 π‘œ 𝟐 π‘—βˆˆπ‘ 𝑙 𝑙=1 β€’ Magnitude of the off-diagonal entries: + π‘œ π‘˜,π‘˜ βˆ’ 1 2 = 𝐿 𝑗,𝑗 𝐿 2 = 𝐿 𝑗,𝑗 𝐿 𝑗,π‘˜ π‘˜,π‘˜ βˆ’ β„™ 𝑗, π‘˜ ∈ 𝑍 𝐿 𝑗,π‘˜ 𝐿 π‘œ 𝟐 𝑗,π‘˜βˆˆπ‘ 𝑙 𝑙=1 β€’ Signs (up to 𝓔 -similarity) ? π‘œ 𝐾 = 1 Use estimates of higher moments: det 𝐿 π‘œ 𝟐 πΎβˆˆπ‘ 𝑙 𝑙=1

  8. Determinantal Graphs Definition 𝐻 = 𝑂 , 𝐹 : 𝑗, π‘˜ ∈ 𝐹 ⇔ 𝐿 𝑗,π‘˜ β‰  0 . βˆ— βˆ— 0 2 𝐿 = βˆ— βˆ— βˆ— 1 0 βˆ— βˆ— 3 Examples: βˆ— βˆ— βˆ— 0 2 βˆ— βˆ— βˆ— 0 𝐿 = βˆ— βˆ— βˆ— βˆ— 1 3 0 0 βˆ— βˆ— 4

  9. Cycle sparsity β€’ Cycle basis : family of induced cycles that span the cycle space 𝐷 2 𝐷 1 + 𝐷 2 𝐷 1 β€’ Cycle sparsity : length β„“ of the largest cycle needed to span the cycle space β€’ Horton’s algorithm : Find a cycle basis with cycle lengths ≀ β„“ in 𝑃 𝐹 2 𝑂 ln 𝑂 βˆ’1 steps [Horton ’87; Amaldi et al. β€˜10]

  10. Cycle sparsity Theorem: 𝐿 is completely determined, up to 𝒠 -similarity, by its principal minors of order ≀ β„“ . for each cycle of length ≀ β„“ . Key: Signs of 𝐿 𝑗,π‘˜ {𝑗,π‘˜}∈𝐷

  11. Learning the signs β€’ Assumption: 𝐿 ∈ 𝒧 𝛽 , i.e., either 𝐿 𝑗,π‘˜ = 0 or 𝐿 𝑗,π‘˜ β‰₯ 𝛽 > 0 β€’ All 𝐿 𝑗,𝑗 ’s and 𝐿 𝑗,π‘˜ ’s are estimated within 𝒐 βˆ’πŸ/πŸ‘ -rate β€’ 𝐻 is recovered exactly w.h.p. β€’ Horton’s algorithm outputs a minimum basis ℬ β€’ For all induced cycle 𝐷 ∈ ℬ + 2 βˆ’1 |𝐷| 2 det 𝐿 𝐷 = 𝐺 𝐷 𝐿 𝑗,𝑗 , 𝐿 𝑗,π‘˜ 𝐿 𝑗,π‘˜ {𝑗,π‘˜}∈𝐷 β€’ Recover the sign of w.h.p. 𝐿 𝑗,π‘˜ {𝑗,π‘˜}∈𝐷

  12. Main result Theorem: Let 𝐿 ∈ 𝒧 𝛽 with cycle sparsity β„“ and let 𝜁 > 0 . Then, the following holds with probability at least 1 βˆ’ π‘œ βˆ’π΅ : 𝐿 in 𝑃 𝐹 3 + π‘œπ‘‚ 2 steps for which There is an algorithm that outputs 2β„“ 𝛽 2 𝜁 2 + β„“ 2 1 π‘œ ≳ ln 𝑂 β‡’ min 𝐿 βˆ’ 𝐸𝐿𝐸 ∞ ≀ 𝜁 𝛽 𝐸 Near-optimal rate in a minimax sense.

  13. Conclusions β€’ Estimation of 𝐿 by a method of moments in polynomial time β€’ Rates of estimation characterized by the topology of the determinantal graph through its cycle sparsity β„“ . β€’ These rates are provably optimal (up to logarithmic factors) β€’ Adaptation to β„“ .

Recommend


More recommend