dirichlet process mixtures are inconsistent for the
play

Dirichlet process mixtures are inconsistent for the number of - PowerPoint PPT Presentation

Dirichlet process mixtures are inconsistent for the number of components in a finite mixture Jeffrey W. Miller and Matthew T. Harrison Division of Applied Mathematics 182 George Street Providence, RI 02912 ICERM, September 17, 2012


  1. Dirichlet process mixtures are inconsistent for the number of components in a finite mixture Jeffrey W. Miller and Matthew T. Harrison Division of Applied Mathematics 182 George Street Providence, RI 02912 ICERM, September 17, 2012

  2. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Outline of the talk Introduction 1 A consistent alternative: Mixture of finite mixtures (MFM) 2 Empirical demonstrations 3 Results 4 Examples from the literature 5 Properties of MFM models 6 Open questions 7 J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 2 / 40

  3. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Outline of the talk Introduction 1 A consistent alternative: Mixture of finite mixtures (MFM) 2 Empirical demonstrations 3 Results 4 Examples from the literature 5 Properties of MFM models 6 Open questions 7 J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 3 / 40

  4. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Notational preliminaries Suppose { p θ : θ ∈ Θ } is a parametric family, with Θ ⊂ R k . We will be interested in discrete probability measures of the form ∞ � q = π i δ θ i i =1 where θ 1 , θ 2 , . . . ∈ Θ and δ θ is the unit point mass at θ ∈ Θ . Let f q denote the density of the resulting mixture, that is, ∞ � � f q ( x ) = p θ ( x ) dq ( θ ) = π i p θ i ( x ) . Θ i =1 Let s ( q ) = | support( q ) | ∈ { 1 , 2 , . . . } ∪ {∞} . Assume identifiability in the sense that f q = f q ′ ⇒ q = q ′ for any q, q ′ with finite support. J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 4 / 40

  5. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Notational preliminaries q = � ∞ i =1 π i δ θ i (mixing distribution) f q ( x ) = � π i p θ i ( x ) (density) s ( q ) = | support( q ) | (number of components) For example, { p θ : θ ∈ Θ } might be univariate normals with θ = ( µ, σ 2 ) . J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 5 / 40

  6. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Two distributions i =1 π i δ θ i , f q ( x ) = � π i p θ i ( x ) , s ( q ) = | support( q ) | . Notation: q = � ∞ Data distribution (the “true” distribution) X 1 , X 2 , . . . iid ∼ f q 0 for some q 0 with s ( q 0 ) < ∞ . Model distribution Q ∼ some prior on discrete measures q , X 1 , X 2 , . . . iid ∼ f Q (given Q ). J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 6 / 40

  7. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Two distributions i =1 π i δ θ i , f q ( x ) = � π i p θ i ( x ) , s ( q ) = | support( q ) | . Notation: q = � ∞ Data distribution (the “true” distribution) X 1 , X 2 , . . . iid ∼ f q 0 for some q 0 with s ( q 0 ) < ∞ . Q Model distribution Q ∼ some prior on discrete measures q , X 1 , X 2 , . . . iid β i ∼ f Q (given Q ). Model distribution (equivalent formulation) X i Q ∼ some prior on discrete measures q , β 1 , β 2 , . . . iid n ∼ Q (given Q ), X i ∼ p β i (given Q, β 1 , β 2 , . . . ) indep. for i = 1 , 2 , . . . . Let T n = # { β 1 , . . . , β n } (i.e. number of distinct components so far). J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 6 / 40

  8. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Many possible questions Data: X 1 , X 2 , . . . iid ∼ f q 0 . Write X 1: n = ( X 1 , . . . , X n ) . iid Model: Q ∼ prior, β i ∼ Q , X i ∼ p β i , and T n = # { β 1 , . . . , β n } . Is the posterior consistent (and at what rate of convergence) . . . 1 . . . for the density? P data i.e. P model (dist( f Q , f q 0 ) < ε | X 1: n ) − n →∞ 1 ∀ ε > 0 ? − − → (Also, does this hold at any sufficiently smooth density, even when it is not a mixture from { p θ : θ ∈ Θ } ?) 2 . . . for the mixing distribution? P data i.e. P model (dist( Q, q 0 ) < ε | X 1: n ) − n →∞ 1 ∀ ε > 0 ? − − → 3 . . . for the number of components? P data i.e. P model ( T n = s ( q 0 ) | X 1: n ) − n →∞ 1 ? − − → (Note: We use T n instead of s ( Q ) since s ( Q ) a.s. = ∞ in a DPM.) J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 7 / 40

  9. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Answers for Dirichlet process mixtures (DPMs) In a DPM, Q ∼ DP( αH ) . Is the posterior consistent (and at what rate of convergence). . . DPMs . . . for the density? Yes (optimal rate) (Ghosal & van der Vaart 2001, 2007) This holds for any sufficiently smooth density (in a certain sense). Contributions also by: Lijoi, Pr¨ unster, Walker, James, Tokdar, Dunson, Bhattacharya, Ghosh, Ramamoorthi, Wu, Khazaei, Rousseau, Balabdaoui, Tang . . . for the mixing distribution? Yes (optimal rate) (Nguyen 2012) . . . for the number of components? Not consistent (Note: Ignoring tiny components when computing T n might fix this issue.) J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 8 / 40

  10. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Outline of the talk Introduction 1 A consistent alternative: Mixture of finite mixtures (MFM) 2 Empirical demonstrations 3 Results 4 Examples from the literature 5 Properties of MFM models 6 Open questions 7 J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 9 / 40

  11. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Mixture of finite mixtures (MFM) Many authors have considered the following natural alternative to DPMs. e.g. Nobile (1994, 2000, 2004, 2005, 2007), Richardson & Green (1997, 2001), Stephens (2000), Zhang et al. (2004), Kruijer (2008), Rousseau (2010), Kruijer, Rousseau, & van der Vaart (2010). Instead of Q ∼ DP( αH ) , choose Q as follows: A mixture over finite mixtures π S S ∼ p ( s ) , a p.m.f. on { 1 , 2 , . . . } π ∼ Dirichlet( α s 1 , . . . , α ss ) (given S = s ) iid θ 1 , . . . , θ s ∼ H (given S = s ) Q X i θ Q = � S i =1 π i δ θ i n For mathematical convenience, we suggest: H as a conjugate prior for { p θ } p ( s ) = Poisson( s − 1 | λ ) α ij = α > 0 for all i, j J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 10 / 40

  12. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Answers for MFM models Is the posterior consistent (and at what rate of convergence). . . DPMs MFMs . . . for the density? Yes (optimal rate) Yes (optimal rate) Doob’s theorem gives consistency at Lebesgue almost-all mixing distributions q 0 . For any sufficiently smooth density, convergence at the optimal rate was proven by Kruijer (2008) and Kruijer, Rousseau, & van der Vaart (2010) (in the same sense as for DPMs). . . . for the mixing distribution? Yes (optimal rate) Yes Doob’s theorem guarantees consistency, as before. Optimal rate? . . . for the number of components? Not consistent Yes By Doob’s theorem, again. J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 11 / 40

  13. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Outline of the talk Introduction 1 A consistent alternative: Mixture of finite mixtures (MFM) 2 Empirical demonstrations 3 Results 4 Examples from the literature 5 Properties of MFM models 6 Open questions 7 J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 12 / 40

  14. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Toy example #1: One normal component Prior (x) and estimated posterior (o) of T n Data: N (0 , 1) . Each plot is the average over 5 datasets. Burn-in: 10,000 sweeps, Sample: 100,000 sweeps. J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 13 / 40

  15. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Toy example #2: Two normal components Prior (x) and estimated posterior (o) of T n 1 2 N (0 , 1) + 1 Data: 2 N (6 , 1) . Each plot is the average over 5 datasets. Burn-in: 10,000 sweeps, Sample: 100,000 sweeps. J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 14 / 40

  16. Introduction A consistent alternative Demonstrations Results Examples MFM Properties Open questions Toy example #3: Five normal components Prior (x) and estimated posterior (o) of T n 2 � 1 5 N (4 k, 1 Data: 2 ) . Each plot is the average over 5 datasets. Burn-in: 10,000 sweeps, Sample: 100,000 sweeps. k = − 2 J. W. Miller (Brown University) DPM inconsistency ICERM, September 17, 2012 15 / 40

Recommend


More recommend