Semiparametric density estimation Asymptotics and illustration References Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012 Joint work with S. Walker 1 / 22
Semiparametric density estimation Asymptotics and illustration References Outline Semiparametric density estimation Asymptotics and illustration References 2 / 22
Semiparametric density estimation Asymptotics and illustration References BNP density estimation • Let X 1 , . . . , X n be exchangeable (i.e. conditionally iid) observations from an unknown density f on the real line. • If F is the density space and Π( d f ) the prior, via Bayes theorem R Q n i = 1 f ( X i ) Π( d f ) A Π( A | X 1 , . . . , X n ) = R Q n i = 1 f ( X i ) Π( d f ) F • Wealth of Bayesian nonparametric (BNP) models • Dirichlet process mixtures of continuos densities; • log spline models; • Bernstein polynomials; • log Gaussian processes. • All with well studied asymptotic properties, e.g. posterior concentration rates n →∞ Π( f : d ( f , f 0 ) > M ǫ n | X 1 , . . . , X n ) → 0 , when X 1 , X 2 , . . . are iid from some “ true ” f 0 . 3 / 22
Semiparametric density estimation Asymptotics and illustration References Discrepancy from a parametric model • Suppose now we have a favorite parametric family f θ ( x ) , θ ∈ Θ ⊂ R p . likely to be misspecified: there is no θ such that f 0 = f θ . • We want to learn about the best parameter value θ 0 which minimizes the Kullback-Leibler divergence from true f 0 : R θ 0 = arg min f 0 log ( f 0 / f θ ) Θ • A nonparametric component W is introduced to model the discrepancy between f 0 and the closest density f θ 0 : f θ, W ( x ) ∝ f θ ( x ) W ( x ) , so that W ( x ) C ( x ) := R W ( s ) f θ ( s ) ds is designed to estimate C 0 ( x ) = f 0 ( x ) / f θ 0 ( x ) . 4 / 22
Semiparametric density estimation Asymptotics and illustration References Related works - Frequentist Hjort and Glad (1995) θ ( x ) , ˆ • Start with a parametric density estimate f ˆ θ being, e.g., the MLE of θ with respect to the likelihood Q n i = 1 log f θ ( x i ) . • Then multiply it with a nonparametric kernel-type of the correction function r ( x ) = f 0 ( x ) / f ˆ θ ( x ) : n K h ( x i − x ) f ˆ θ ( x ) r ( x ) = 1 X ˆ f ( x ) = f ˆ θ ( x )ˆ n f ˆ θ ( x i ) i = 1 in a two-stage sequential analysis . • ˆ f is shown to be more precise than traditional kernel density estimator in a broad neighborhood around the parametric family, while losing little when the f 0 is far from the parametric family. 5 / 22
Semiparametric density estimation Asymptotics and illustration References Related works - Bayes Nonparametric prior built around a parametric model via f ( x ) = f θ ( x ) g ( F θ ( x )) , where F θ is the cdf of f θ and g is a density on [ 0 , 1 ] with prior Π . • Verdinelli and Wasserman (1999): Π as an infinite exponential family. Application to goodness of fit testing. • Rousseau (2008): Π as a mixtures of betas. Application to goodness of fit testing. • Tokdar (2007): Π as a log Gaussian process prior. Application to posterior inference for densities with unbounded support. R 1 0 e Z ( s ) d s and Z Gaussian process with covariance For g ( x ) = e Z ( x ) / σ ( · , · ) , f ( x ) can be written ˜ Z ( x ) f ( x ) ∝ f θ ( x ) e |{z} W ( x ) for ˜ Z Gaussian process with covariance σ ( F θ ( · ) , F θ ( · )) . 6 / 22
Semiparametric density estimation Asymptotics and illustration References Posterior updating W ( x ) f θ, W ( x ) ∝ f θ ( x ) W ( x ) , C ( x ) := W ( s ) f θ ( s ) ds . R • Truly semi–parametric: aim is at learning about the best parameter θ 0 , R then at seeing how close f θ 0 is to f 0 via C ( x ) = W ( x ) / W ( s ) f θ ( s ) d s . • Situation in which the updating process from prior to posterior may be seen as problematic: the model f θ, W is intrinsically non identified in ( θ, C ) • The full Bayesian update π ( θ, W | x 1 , . . . , x n ) ∝ π ( θ ) π ( W ) Q n ˜ i = 1 f θ, W ( x i ) is appropriate for learning about f 0 ; it is not so for learning about ( θ 0 , C 0 ) . R • The marginal posterior ˜ π ( θ | x 1 , . . . , x n ) = ˜ π ( θ, W | x 1 , . . . , x n ) d W has no interpretation: it is not identified what parameter value this ˜ π is targeting. 7 / 22
Semiparametric density estimation Asymptotics and illustration References Posterior updating • What removes us from the formal Bayes set–up is the desire to specifically learn about θ 0 . • θ 0 defined without any reference to W , or C . Whether we are interested in learning about C 0 or not, our beliefs about θ 0 should not change. • Hence, the appropriate update for θ is the parametric one: π ( θ | x 1 , . . . , x n ) ∝ π ( θ ) Q n i = 1 f θ ( x i ) . • We keep updating W according to the semi–parametric model, π ( W | θ, x 1 , . . . , x n ) ∝ π ( W ) Q n ˜ i = 1 f θ, W ( x i ) , so our updating scheme is π ( θ, W | x 1 , . . . , x n ) = ˜ π ( W | θ, x 1 , . . . , x n ) π ( θ | x 1 , . . . , x n ) . non-full Bayesian update 8 / 22
Semiparametric density estimation Asymptotics and illustration References Posterior updating π ( θ, W | x 1 , . . . , x n ) = ˜ π ( W | θ, x 1 , . . . , x n ) π ( θ | x 1 , . . . , x n ) . • ( θ, W ) are estimated sequentially , with W reflecting additional uncertainty on θ . • Marginalization of the posterior over W is well defined , R π ( W | x 1 , . . . , x n ) = Θ ˜ π ( W | θ, x 1 , . . . , x n ) π ( d θ | x 1 , . . . , x n ) since π ( θ | x 1 , . . . , x n ) describes the beliefs about the real parameter θ 0 . • Coherence is about properly defining the quantities of interest and showing that Bayesian updates provide learning about these quantities and this is checked by what is yielded asymptotically. • Hence we seek frequentist validation: we show that the posterior of ( θ, C ) converges to a point mass at ( θ 0 , C 0 ) . 9 / 22
Semiparametric density estimation Asymptotics and illustration References Lenk (2003) • Let I be a compact interval on the real line and Z a Gaussian process. Lenk (2003) considers the semi–parametric density model f θ ( x ) e Z ( x ) f ( x ) = R I f θ ( s ) e Z ( s ) d s for f θ ( x ) member of the exponential family. • In the Loève expansion of Z ( x ) , the orthogonal basis is chosen so that the sample paths integrate to zero. • Further assumption for identification: the orthogonal basis does not contain any of the canonical statistics of f θ ( x ) . • Estimation based on truncation of the series expansion or by imputation of the Gaussian process at a fixed grid of points, see Tokdar (2007). 10 / 22
Semiparametric density estimation Asymptotics and illustration References Bounded W ( x ) • Building upon Lenk (2003), we keep working with Gaussian processes and consider f θ ( x ) W ( x ) f θ, W ( x ) = I f θ ( s ) W ( s ) d s , W ( x ) = Ψ( Z ( x )) R where Ψ( u ) is a cdf having a smooth unimodal symmetric density ψ ( u ) on the real line. • With an additional condition on Ψ( u ) , we can show that W ( x ) preserves the asymptotic properties of log Gaussian process prior. • On the other hand, with W ( x ) ≤ 1, Walker (2011) describes a latent model which can deal with the intractable normalizing constant. It is based on ! »Z – k „ « n ∞ n + k − 1 1 X f θ ( s ) ( 1 − W ( s )) d s = . R k W ( s ) f θ ( s ) d s k = 0 11 / 22
Semiparametric density estimation Asymptotics and illustration References Link function Ψ( u ) • Lipschitz condition on log Ψ( u ) : ψ ( u ) / Ψ( u ) ≤ m uniformly on R satisfied by the standard Laplace cdf, standard logistic cdf or standard Cauchy cdf, but not by the standard normal cdf. • For fixed θ , write p z = f θ, Ψ( z ) . It can be shown that, when � z 1 − z 2 � ∞ < ǫ , ( ≤ m ǫ e m ǫ/ 2 h ( p z 1 , p z 2 ) � m 2 ǫ 2 e m ǫ ( 1 + m ǫ ) K ( p z 1 , p z 2 ) • Posterior asymptotic results of van der Vaart and van Zanten (2008) carries over to this setting: If Ψ − 1 ( f 0 / f θ ) is contained in the support of Z , then Π { p z : h ( p z , f 0 ) > ǫ | X 1 , . . . , X n } → 0 , F ∞ − a.s. 0 Results on posterior contraction rate can be also derived. 12 / 22
Semiparametric density estimation Asymptotics and illustration References Conditional posterior of W (A) Lipschitz condition on log Ψ( u ) ; (B) f θ ( x ) is continuous and bounded away from zero; (C) the support of Z contains the space C ( I ) of continuous densities on I . Theorem 1. Under assumptions (A), (B) and (C), the conditional posterior of W given θ is exponentially consistent at all f 0 ∈ C ( I ) , i.e. for any ǫ > 0, π { W : h ( f θ, W , f 0 ) > ǫ | θ, X 1 , . . . , X n } ≤ e − dn , F ∞ ˜ − a.s. 0 for some d > 0 as n → ∞ . R • As corollary, for fixed θ , the posterior of C ( x ) = W ( x ) / I f θ ( s ) W ( s ) d s consistently estimates the discrepancy f 0 ( x ) / f θ ( x ) . • The exponential convergence to 0 is a by-product of standard techniques for proving posterior consistency. 13 / 22
Recommend
More recommend