Rates of f Estimation for Dis iscrete Determinantal Point Processes V.-E. Brunel, A. Moitra, P. Rigollet, J. Urschel COLT 2017, Amsterdam
Discrete DPPs Random variables on the hypercube π, π πΆ , represented as subsets of [πΆ] . β {1,4,5,7,9,10,12,15,19} 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 0 1 0 β {3,4,6,8,9, 12,15,19} β {1,4,8,12,14,17,18,20} 1 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 1 β¦ β {3,6,8,9, 15,16,18} 0 0 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0 1 0 0
Discrete DPPs β’ Probabilistic model for correlated Bernoulli r.v. β’ Feature repulsion (negative association) Random subset π β [π] , Definition πΏ β β πΓπ , symmetric, 0 βΌ πΏ βΌ π½ β πΎ β π = det πΏ πΎ , βπΎ β’ πΏ π,π β¬ repulsion between items π and π . β’ PMF: β π = πΎ = det πΏ β π½ πΎ
Goal π βΌ DPP πΏ β , estimate πΏ β . iid β’ Given π 1 , π 2 , β¦ , π β’ Approach: Maximum Likelihood Estimator. β’ Question: Rate of convergence of the MLE ?
Identification Id β , βπΎ β [π] β’ DPP πΏ = DPP πΏ β β det πΏ πΎ = det πΏ πΎ Β±1 0 Β±1 β πΏ = πΈπΏ β πΈ for some D = . β± 0 Β±1 β β β + + + + + β β + + + + + β + + β β’ E.g.: K β = πΈK β πΈ = β + + + + β + + β β + + + + + β β + πΏ, πΏ β = min Measure of the error of an estimator β || πΏ β πΈπΏ β πΈ|| πΊ π³ : πΈ
Maximum likelihood estimation β’ Log-likelihood: Ξ¨ πΏ = π πΎ ln det K β I πΎ πΎ β π πΏ β argmax β’ MLE: Ξ¨(πΏ) β ln det K β I Ξ¨ πΏ β π½ = π πΎ Ξ¨ πΏ πΎ πΎ β π = Ξ¨ πΏ β β πΏπ πΈππ πΏ β , πΈππ πΏ
Likelihood geometry ry Fisher information: βπΌ 2 Ξ¨ πΏ β Ξ¨ πΏ Ξ¨ πΏ πΏ πΏ πΏ β πΏ β πΌ 2 Ξ¨ K β < 0 πΌ 2 Ξ¨ K β = 0 What is the order of the first non degenerate derivative of π at π³ = π³ β ?
Determinantal Graphs & Ir Irreducibility Definition β β 0 . π» = π , πΉ : π, π β πΉ β πΏ π,π β’ πΏ β is irreducible iff π» is connected. β’ Otherwise, πΏ β is block diagonal. β’ Rk: πΏ β is block diagonal β π = union of independent DPPs β’ Write π βΌ π when π and π are connected in π» .
Main Results: Ir Irreducible case Theorem 1 πΏ β irreducible β πΌ 2 Ξ¨(πΏ β ) is definite negative Statistical consequences: πΏ, πΏ β = π β π β 1 ο β 2 ο CLT
Main Results: Block diagonal case (1 (1) Theorem 2 Ker πΌ 2 Ξ¨ πΏ β = πΌ β β πΓπ : πΌ π,π = 0, βπ βΌ π πΆ π π π³ β is negative definite along directions supported on the blocks of π³ β . Theorem 3 πΌ β3 = 0 β {0} : πΌ 3 Ξ¨ πΏ β For πΌ β Ker πΌ 2 Ξ¨ πΏ β πΌ β4 < 0 πΌ 4 Ξ¨ πΏ β
Main Results: Block diagonal case (2 (2) Statistical consequences: πΏ, πΏ β = π β π β 1 ο β 6 β = π β π β 1 ο β for all blocks π of πΏ β . πΏ π , πΏ 2 π
Conclusions β’ Rates of convergence of the MLE: if πΏ β is irreducible π β1/2 π β1/6 otherwise β’ Rate only determined by connectedness of the determinantal graph β’ Hidden constants can be arbitrarily large in π : e.g., if π» is a path graph β’ In another paper we show that the sample complexity of a method-of-moment * estimator is determined by the cycle sparsity of π» . * Learning Determinantal Point Processes from Moments and Cycles , J. Urschel, V.-E. Brunel, A. Moitra, P. Rigollet, ICML 2017
Recommend
More recommend