rates of f estimation for dis iscrete determinantal point
play

Rates of f Estimation for Dis iscrete Determinantal Point - PowerPoint PPT Presentation

Rates of f Estimation for Dis iscrete Determinantal Point Processes V.-E. Brunel, A. Moitra, P. Rigollet, J. Urschel COLT 2017, Amsterdam Discrete DPPs Random variables on the hypercube , , represented as subsets of [] .


  1. Rates of f Estimation for Dis iscrete Determinantal Point Processes V.-E. Brunel, A. Moitra, P. Rigollet, J. Urschel COLT 2017, Amsterdam

  2. Discrete DPPs Random variables on the hypercube 𝟏, 𝟐 𝑢 , represented as subsets of [𝑢] . ↔ {1,4,5,7,9,10,12,15,19} 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 0 1 0 ↔ {3,4,6,8,9, 12,15,19} ↔ {1,4,8,12,14,17,18,20} 1 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 1 … ↔ {3,6,8,9, 15,16,18} 0 0 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0 1 0 0

  3. Discrete DPPs β€’ Probabilistic model for correlated Bernoulli r.v. β€’ Feature repulsion (negative association) Random subset 𝑍 βŠ† [𝑂] , Definition 𝐿 ∈ ℝ 𝑂×𝑂 , symmetric, 0 β‰Ό 𝐿 β‰Ό 𝐽 β„™ 𝐾 βŠ† 𝑍 = det 𝐿 𝐾 , βˆ€πΎ β€’ 𝐿 𝑗,π‘˜ ↬ repulsion between items 𝑗 and π‘˜ . β€’ PMF: β„™ 𝑍 = 𝐾 = det 𝐿 βˆ’ 𝐽 𝐾

  4. Goal π‘œ ∼ DPP 𝐿 βˆ— , estimate 𝐿 βˆ— . iid β€’ Given 𝑍 1 , 𝑍 2 , … , 𝑍 β€’ Approach: Maximum Likelihood Estimator. β€’ Question: Rate of convergence of the MLE ?

  5. Identification Id βˆ— , βˆ€πΎ βŠ† [𝑂] β€’ DPP 𝐿 = DPP 𝐿 βˆ— ⇔ det 𝐿 𝐾 = det 𝐿 𝐾 Β±1 0 Β±1 ⇔ 𝐿 = 𝐸𝐿 βˆ— 𝐸 for some D = . β‹± 0 Β±1 ↓ ↓ ← + + + + + βˆ’ βˆ’ + + + + + βˆ’ + + βˆ’ β€’ E.g.: K βˆ— = 𝐸K βˆ— 𝐸 = ⇝ + + + + βˆ’ + + βˆ’ ← + + + + + βˆ’ βˆ’ + 𝐿, 𝐿 βˆ— = min Measure of the error of an estimator β„“ || 𝐿 βˆ’ 𝐸𝐿 βˆ— 𝐸|| 𝐺 𝑳 : 𝐸

  6. Maximum likelihood estimation β€’ Log-likelihood: Ξ¨ 𝐿 = π‘ž 𝐾 ln det K βˆ’ I 𝐾 𝐾 βŠ† 𝑂 𝐿 ∈ argmax β€’ MLE: Ξ¨(𝐿) βˆ— ln det K βˆ’ I Ξ¨ 𝐿 β‰œ 𝔽 = π‘ž 𝐾 Ξ¨ 𝐿 𝐾 𝐾 βŠ† 𝑂 = Ξ¨ 𝐿 βˆ— βˆ’ 𝐿𝑀 𝐸𝑄𝑄 𝐿 βˆ— , 𝐸𝑄𝑄 𝐿

  7. Likelihood geometry ry Fisher information: βˆ’π›Ό 2 Ξ¨ 𝐿 βˆ— Ξ¨ 𝐿 Ξ¨ 𝐿 𝐿 𝐿 𝐿 βˆ— 𝐿 βˆ— 𝛼 2 Ξ¨ K βˆ— < 0 𝛼 2 Ξ¨ K βˆ— = 0 What is the order of the first non degenerate derivative of 𝛀 at 𝑳 = 𝑳 βˆ— ?

  8. Determinantal Graphs & Ir Irreducibility Definition βˆ— β‰  0 . 𝐻 = 𝑂 , 𝐹 : 𝑗, π‘˜ ∈ 𝐹 ⇔ 𝐿 𝑗,π‘˜ β€’ 𝐿 βˆ— is irreducible iff 𝐻 is connected. β€’ Otherwise, 𝐿 βˆ— is block diagonal. β€’ Rk: 𝐿 βˆ— is block diagonal β‡’ 𝑍 = union of independent DPPs β€’ Write 𝑗 ∼ π‘˜ when 𝑗 and π‘˜ are connected in 𝐻 .

  9. Main Results: Ir Irreducible case Theorem 1 𝐿 βˆ— irreducible ⇔ 𝛼 2 Ξ¨(𝐿 βˆ— ) is definite negative Statistical consequences: 𝐿, 𝐿 βˆ— = 𝑃 β„™ π‘œ βˆ’ 1 οƒ˜ β„“ 2 οƒ˜ CLT

  10. Main Results: Block diagonal case (1 (1) Theorem 2 Ker 𝛼 2 Ξ¨ 𝐿 βˆ— = 𝐼 ∈ ℝ 𝑂×𝑂 : 𝐼 𝑗,π‘˜ = 0, βˆ€π‘— ∼ π‘˜ 𝜢 πŸ‘ 𝛀 𝑳 βˆ— is negative definite along directions supported on the blocks of 𝑳 βˆ— . Theorem 3 𝐼 βŠ—3 = 0 βˆ– {0} : 𝛼 3 Ξ¨ 𝐿 βˆ— For 𝐼 ∈ Ker 𝛼 2 Ξ¨ 𝐿 βˆ— 𝐼 βŠ—4 < 0 𝛼 4 Ξ¨ 𝐿 βˆ—

  11. Main Results: Block diagonal case (2 (2) Statistical consequences: 𝐿, 𝐿 βˆ— = 𝑃 β„™ π‘œ βˆ’ 1 οƒ˜ β„“ 6 βˆ— = 𝑃 β„™ π‘œ βˆ’ 1 οƒ˜ β„“ for all blocks 𝑇 of 𝐿 βˆ— . 𝐿 𝑇 , 𝐿 2 𝑇

  12. Conclusions β€’ Rates of convergence of the MLE: if 𝐿 βˆ— is irreducible π‘œ βˆ’1/2 π‘œ βˆ’1/6 otherwise β€’ Rate only determined by connectedness of the determinantal graph β€’ Hidden constants can be arbitrarily large in 𝑂 : e.g., if 𝐻 is a path graph β€’ In another paper we show that the sample complexity of a method-of-moment * estimator is determined by the cycle sparsity of 𝐻 . * Learning Determinantal Point Processes from Moments and Cycles , J. Urschel, V.-E. Brunel, A. Moitra, P. Rigollet, ICML 2017

Recommend


More recommend