Aspects of symmetric Gamma process mixtures Zacharie Naulet (Paris-Dauphine University) Joint work with: Judith Rousseau (Paris-Dauphine University) Éric Barat (Commissariat à l’Energie Atomique) Colloque JPS | 20th April 2016 Zacharie Naulet BNP regression using mixtures 20th april 2015 1 / 23
Outline 1 Introduction / Bayesian statistics 2 Symmetric Gamma Process mixtures 3 Asymptotic results General theorems Application to mixtures Zacharie Naulet BNP regression using mixtures 20th april 2015 1 / 23
Frequentist vs Bayes Frequentist approach • Choose a model P n = { P n θ : θ ∈ Θ } . • Observations Y := ( Y 1 , . . . , Y n ) ∈ Y n are random variables with joint distribution P n θ 0 ∈ P n , θ 0 is unknown but assumed to be deterministic. • Build an estimator ˆ θ n ( Y ) of θ 0 (ideally converging to θ 0 under P θ 0 ). Zacharie Naulet BNP regression using mixtures 20th april 2015 2 / 23
Frequentist vs Bayes Frequentist approach • Choose a model P n = { P n θ : θ ∈ Θ } . • Observations Y := ( Y 1 , . . . , Y n ) ∈ Y n are random variables with joint distribution P n θ 0 ∈ P n , θ 0 is unknown but assumed to be deterministic. • Build an estimator ˆ θ n ( Y ) of θ 0 (ideally converging to θ 0 under P θ 0 ). Bayesian approach • Observations Y = ( Y 1 , . . . , Y n ) and parameter θ are random variables with joint distribution Π on Y n × Θ. • P n θ ( · ) = Π( ·| θ ). • Marginal of Π on Θ, Π θ , is called the prior distribution. • The model is the probability space (Θ , Σ , Π θ ). Zacharie Naulet BNP regression using mixtures 20th april 2015 2 / 23
Frequentist vs Bayes Frequentist approach • Choose a model P n = { P n θ : θ ∈ Θ } . • Observations Y := ( Y 1 , . . . , Y n ) ∈ Y n are random variables with joint distribution P n θ 0 ∈ P n , θ 0 is unknown but assumed to be deterministic. • Build an estimator ˆ θ n ( Y ) of θ 0 (ideally converging to θ 0 under P θ 0 ). Bayesian approach • Observations Y = ( Y 1 , . . . , Y n ) and parameter θ are random variables with joint distribution Π on Y n × Θ. • P n θ ( · ) = Π( ·| θ ). • Marginal of Π on Θ, Π θ , is called the prior distribution. • The model is the probability space (Θ , Σ , Π θ ). In both cases, the model is 1 Parametric if Θ is a finite-dimensional vector space. 2 Non parametric if Θ is an infinite-dimensional vector space. Zacharie Naulet BNP regression using mixtures 20th april 2015 2 / 23
Bayesian estimation The conditional distribution Π θ | Y is called the posterior distribution, and is given by the Bayes rule : Π θ | Y ( U | B ) = Π Y | θ ( B | U )Π θ ( U ) . Π Y ( B ) Zacharie Naulet BNP regression using mixtures 20th april 2015 3 / 23
Bayesian estimation The conditional distribution Π θ | Y is called the posterior distribution, and is given by the Bayes rule : Π θ | Y ( U | B ) = Π Y | θ ( B | U )Π θ ( U ) . Π Y ( B ) Bayesian point estimator � • Posterior mean : ˆ θ n ( Y ) = Θ θ d Π( θ | Y 1 , . . . , Y n ). • If the posterior is dominated on Θ, Maximum a Posterior (MAP) : ˆ θ n = arg max θ ∈ Θ π ( θ | Y 1 , . . . , Y n ). Credible intervals • U is a credible interval with level α if, Π( θ ∈ U | Y 1 , . . . , Y n ) = 1 − α. Zacharie Naulet BNP regression using mixtures 20th april 2015 3 / 23
Bayesian estimation Two kinds of Bayesians : 1 Classical : The classical Bayesian believe in the existence of a true parameter to be estimated from the data ( e.g. Laplace, Bayes) 2 Subjectivist : The subjectivist Bayesian rejects the idea of a true parameter, there are no objectives probability models ( e.g. Diaconis, De Finetti) More details in Persi Diaconis and David Freedman (1986). “On the consistency of Bayes estimates”. In: The Annals of Statistics , pp. 1–26. Zacharie Naulet BNP regression using mixtures 20th april 2015 4 / 23
Bayesian estimation Two kinds of Bayesians : 1 Classical : The classical Bayesian believe in the existence of a true parameter to be estimated from the data ( e.g. Laplace, Bayes) 2 Subjectivist : The subjectivist Bayesian rejects the idea of a true parameter, there are no objectives probability models ( e.g. Diaconis, De Finetti) More details in Persi Diaconis and David Freedman (1986). “On the consistency of Bayes estimates”. In: The Annals of Statistics , pp. 1–26. If we are a classical Bayesian, we probably want that our posterior distribution converges (in some sense) to a degenerate distribution at θ 0 , as the data increase. Zacharie Naulet BNP regression using mixtures 20th april 2015 4 / 23
Bayesian estimation Two kinds of Bayesians : 1 Classical : The classical Bayesian believe in the existence of a true parameter to be estimated from the data ( e.g. Laplace, Bayes) 2 Subjectivist : The subjectivist Bayesian rejects the idea of a true parameter, there are no objectives probability models ( e.g. Diaconis, De Finetti) More details in Persi Diaconis and David Freedman (1986). “On the consistency of Bayes estimates”. In: The Annals of Statistics , pp. 1–26. If we are a classical Bayesian, we probably want that our posterior distribution converges (in some sense) to a degenerate distribution at θ 0 , as the data increase. • Frequentist consistency : An estimator ˆ θ n ( Y ) is consistent at θ 0 (in the distance d ) if d (ˆ θ n , θ 0 ) → 0 in P ∞ θ 0 -probability. • Bayesian consistency : The sequence of posterior distributions { Π n ( ·| Y ) } is consistent at θ 0 (in the distance d ) if for all ǫ > 0, P ∞ Π n ( { θ : d ( θ, θ 0 ) ≥ ǫ }| Y 1 , . . . , Y n ) → 0 , θ 0 -a.s. Zacharie Naulet BNP regression using mixtures 20th april 2015 4 / 23
Outline 1 Introduction / Bayesian statistics 2 Symmetric Gamma Process mixtures 3 Asymptotic results General theorems Application to mixtures Zacharie Naulet BNP regression using mixtures 20th april 2015 4 / 23
Problem Statement We consider the nonparametric (direct or indirect) regression problem with f : R d → C and data ( X i , Y i ), i = 1 , . . . , n , with E [ Y i | X i ] = T ( f )( X i ) , X i ∈ X , from a Bayesian perspective. Zacharie Naulet BNP regression using mixtures 20th april 2015 5 / 23
Problem Statement We consider the nonparametric (direct or indirect) regression problem with f : R d → C and data ( X i , Y i ), i = 1 , . . . , n , with E [ Y i | X i ] = T ( f )( X i ) , X i ∈ X , from a Bayesian perspective. Canonical example: Gaussian mean regression, with f : R d → R : Y i | X i , ǫ i = f ( X i ) + ǫ i , i = 1 , . . . , n iid ∼ N (0 , σ 2 ) ǫ 1 , . . . , ǫ n f ∼ Π . The posterior distribution Π( ·| Y 1 , . . . , Y n ) is given by � Π( f ∈ U | Y 1 , Y 2 , . . . ) L ( f | Y 1 , Y 1 , . . . ) Π( df ) ∝ . � �� � � �� � � �� � U posterior likelihood prior Zacharie Naulet BNP regression using mixtures 20th april 2015 5 / 23
Prior distributions on function spaces Brief (non exhaustive) state of the art for prior distributions on function spaces. Regression: • Gaussian processes (Rasmussen, 2004) Density estimation: • Dirichlet processes mixtures (Escobar and West, 1995) Idea: Use Kernel mixtures models in regression problems • Abramovich, Sapatinas, and Silverman (2000), • Wolpert, Ickstadt, and Hansen (2003), • Pillai et al. (2007) and Pillai (2008), • Wolpert, Clyde, and Tu (2011), • Malou (2014), • This talk, and Naulet and Barat (2015). Zacharie Naulet BNP regression using mixtures 20th april 2015 6 / 23
Kernel mixtures models Let • G be a measurable space • M ( G ) be the set of signed (or complex-valued) measures on G • Π ∗ ( dQ ) be a prior distribution on M ( G ) • Φ : G × R d → R be a kernel function Then Π ∗ ( dQ ) induces a prior distribution on an abstract space of functions f : R d → R through the mapping � M ( G ) ∋ Q �→ Φ( x ; · ) dQ ( x ) . G Let Π( df ) denote this prior distribution : � ⇒ f ( · ) = G Φ( x ; · ) dQ ( x ) f ∼ Π( df ) ⇐ Q ∼ Π ∗ ( dQ ) . Zacharie Naulet BNP regression using mixtures 20th april 2015 7 / 23
Kernel mixtures models Examples of prior distributions on M ( G ) ( ie. random measures): • Dirichlet processes, • Completely Random Measures (Kingman, 1967; Kingman, 1992; Naulet and Barat, 2015). • Lévy Random Measures (Wolpert, Clyde, and Tu, 2011; Pillai, 2008; Rajput and Rosinski, 1989; Barndorff-Nielsen and Schmiegel, 2004). Examples of kernels: • Location-scale kernels : � σ − 1 g ( · /σ ) dQ ( µ, σ ) , f ( · ) := • Location-modulation kernels : � f ( · ) := g ( · − µ ) cos( � ω, ·� + θ ) dQ ( µ, ω, θ ) , • . . . Zacharie Naulet BNP regression using mixtures 20th april 2015 8 / 23
Symmetric Gamma Random Measures Symmetric Gamma Random Measures (SGRM) are distributions over space of signed measures (random signed measure) Zacharie Naulet BNP regression using mixtures 20th april 2015 9 / 23
Symmetric Gamma Random Measures Symmetric Gamma Random Measures (SGRM) are distributions over space of signed measures (random signed measure) We let, 1 (Ω , F , P ) be a probability space, and 2 ( G , Σ G ) be a measurable space. Zacharie Naulet BNP regression using mixtures 20th april 2015 9 / 23
Recommend
More recommend