Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University January 24, 2019 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 1 / 32
Outline Normal model, unknown mean Jeffreys prior Natural conjugate prior Posterior Normal model, unknown variance Jeffreys prior Natural conjugate prior Posterior JAGS/Stan Binomial model, unknown probability Normal model, unknown mean Normal model, unknown variance Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 2 / 32
Normal model, unknown mean Jeffreys prior for µ Jeffreys prior for µ Theorem iid ∼ N ( µ, s 2 ) ( s 2 known), Jeffreys prior for µ is p ( µ ) ∝ 1 . If Y i Proof. Since the normal distribution with unknown mean is an exponential family, use Casella & Berger Lemma 7.3.11 � ∂ 2 � � ∂ 2 i =1 ( y i − µ ) 2 �� � − log(2 πs 2 ) / 2 − 1 � n − E y ∂µ 2 log p ( y | µ ) = − E y ∂µ 2 2 s 2 � i − 2 µny + nµ 2 ��� ∂ 2 � �� n − log(2 πs 2 ) / 2 − 1 i =1 y 2 = − E y ∂µ 2 2 s 2 � � �� ∂ 1 = − E y − 2 s 2 ( − 2 ny + 2 nµ ) ∂µ � � 1 = − E y − 2 s 2 (2 n ) = n/s 2 � � n/s 2 p ( µ ) ∝ |I ( µ ) | = ∝ 1 So Jeffreys prior for µ is p ( µ ) ∝ 1 . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 3 / 32
Normal model, unknown mean Jeffreys prior for µ Posterior propriety � ∞ Since −∞ 1 dµ is not finite, we need to check posterior propriety. Theorem For n > 0 , the posterior for a normal mean (known variance) using Jeffreys prior is proper. Proof. The posterior is p ( µ | y ) ∝ p ( y | µ ) p ( µ ) � n � − 1 i =1 ( y i − µ ) 2 � ∝ exp × 1 2 s 2 � − 1 � − 2 µny + nµ 2 �� ∝ exp 2 s 2 � µ 2 − 2 µy �� 1 � = exp − . 2 s 2 /n This is the kernel of a normal distribution with mean y and variance s 2 /n which is proper if n > 0 . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 4 / 32
Normal model, unknown mean Natural conjugate prior Natural conjugate prior iid ∼ N ( µ, s 2 ) with s 2 known. The likelihood is Let Y i � �� µ 2 − 2 µy 1 � L ( µ ) = exp − 2 s 2 /n � n s 2 µ 2 − 2 µ n � − 1 �� ∝ exp s 2 y 2 This is the kernel of a normal distribution, so the natural conjugate prior is µ ∼ N ( m, C ) . p ( µ | y ) ∝ p ( y | µ ) p ( µ ) = L ( µ ) p ( µ ) � n � 1 s 2 µ 2 − 2 µ n C µ 2 − 2 µ 1 − 1 − 1 � �� � �� = exp s 2 y exp C m 2 �� 1 � 1 2 µ 2 − 2 µ � − 1 C + n � C m + n ��� = exp s 2 y s 2 2 � � ��� � 1 µ 2 − 2 µ 1 1 C m + n = exp − s 2 y − 1 ( 1 C + n s 2 ) 2 ( 1 C + n s 2 ) This is the kernel of a N ( m ′ , C ′ ) where C ′ = C − 1 + n/s 2 � − 1 m ′ = C ′ � C − 1 m + n/s 2 y � � Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 5 / 32
Normal model, unknown mean Natural conjugate prior Normal mean posterior comments Let P = 1 /C , P ′ = 1 /C ′ , and Q = 1 /s 2 be the relevant precisions (inverse variances), then The posterior precision is the sum of the prior and observation precisions. n P ′ = P + � Q = P + nQ. i =1 The posterior mean is a precision weighted average of the prior and data. 1 m ′ = P ′ [ Pm + nQy ] = P P ′ m + n Q P ′ y P ′ m + � n Q = P P ′ y i i =1 Jeffreys prior/posterior are the limits of the conjugate prior/posterior as C → ∞ , i.e. C →∞ N ( m, C ) d C →∞ N ( m ′ , C ′ ) d → N ( y, s 2 /n ) lim → ∝ 1 lim Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 6 / 32
Normal model, unknown mean Natural conjugate prior Example ind Consider Y i ∼ N ( µ, 1) and µ ∼ N (0 , 1) . # Prior m = 0 C = 1; P = 1/C # Data mu = 1 s2 = 1; Q = 1/s2 n = 3 set.seed(6); (y = rnorm(n,mu,sqrt(1/Q))) [1] 1.2696060 0.3700146 1.8686598 # Posterior nQ = n*Q Pp = P+nQ mp = (P*m+nQ*mean(y))/Pp Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 7 / 32
Normal model, unknown mean Natural conjugate prior Normal model with unknown mean, normal prior 0.8 Prior Posterior Likelihood 0.6 Density 0.4 0.2 0.0 −2 −1 0 1 2 3 4 µ Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 8 / 32
Normal model, unknown variance Theorem iid ∼ N ( m, σ 2 ) ( m known), Jeffreys prior for σ 2 is p ( σ 2 ) ∝ 1 /σ 2 . If Y i Proof. Since the normal distribution with unknown variance is an exponential family, use Casella & Berger Lemma 7.3.11. � ∂ 2 � � ∂ 2 � ∂ ( σ 2)2 log p ( y | σ 2 ) ∂ ( σ 2)2 − n log(2 πσ 2 ) / 2 − 1 � n i =1 ( y i − m ) 2 − E y = − E y 2 σ 2 � � i =1 ( y i − m ) 2 ∂ n 1 � n = − E y ∂ ( σ 2) − 2 σ 2 + 2( σ 2)2 � � n 1 � n i =1 ( y i − m ) 2 = − E y 2( σ 2)2 − ( σ 2)3 ( σ 2)3 σ 2 n n = − 2( σ 2)2 + = n 2 ( σ 2 ) − 2 p ( σ 2 ) |I ( σ 2 ) | ∝ 1 /σ 2 � ∝ So Jeffreys prior is p ( σ 2 ) ∝ 1 /σ 2 . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 9 / 32
Normal model, unknown variance Posterior propriety � ∞ 0 1 /σ 2 dσ 2 is not finite, we need to check posterior propriety. Since Theorem For n > 0 and at least one y i � = m , the posterior for a normal variance (known mean) using Jeffreys prior is proper. Proof. The posterior is p ( σ 2 | y ) ∝ p ( y | σ 2 ) p ( σ 2 ) = (2 πσ 2 ) − n/ 2 exp � n � − 1 i =1 [ y i − m ] 2 � ( σ 2 ) − 1 2 σ 2 ∝ ( σ 2 ) − n/ 2 − 1 exp � n � − 1 i =1 [ y i − m ] 2 � 2 σ 2 This is the kernel of an inverse gamma distribution with shape n/ 2 and scale � n i =1 [ y i − m ] 2 / 2 which will be proper so long as n > 0 and at least one y i � = m . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 10 / 32
Normal model, unknown variance Natural conjugate prior iid ∼ N ( m, σ 2 ) with m known. The likelihood is Let Y i ∝ ( σ 2 ) − n/ 2 exp � n L ( σ 2 ) � 1 i =1 [ y i − m ] 2 � − 2 σ 2 This is the kernel of an inverse gamma distribution, so the natural conjugate prior is IG ( a, b ) . p ( σ 2 | y ) ∝ p ( y | σ 2 ) p ( σ 2 ) = ( σ 2 ) − n/ 2 exp ( σ 2 ) − a − 1 exp( − b/σ 2 ) � n 1 i =1 [ y i − m ] 2 � � − 2 σ 2 = ( σ 2 ) − ( a + n/ 2) − 1 exp b + � n − 1 � � i =1 [ y i − m ] 2 / 2 �� σ 2 This is the kernel of an inverse gamma distribution with shape a + n/ 2 and scale b + � n i =1 [ y i − m ] 2 / 2 . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 11 / 32
Normal model, unknown variance Example ind ∼ N (1 , σ 2 ) and σ 2 ∼ IG (1 , 1) . Suppose Y i # Prior a = b = 1 # Data m = 1 n = length(y) y [1] 1.2696060 0.3700146 1.8686598 # Posterior ap = a+n/2 (bp = b+sum((y-m)^2)/2) [1] 1.612069 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 12 / 32
Normal model, unknown variance Normal model with unknown variance, inverse gamma prior Prior 1.0 Posterior Likelihood 0.8 Density 0.6 0.4 0.2 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 σ 2 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 13 / 32
Normal model, unknown variance Summary Summary Suppose Y i ∼ N ( µ, σ 2 ) . µ unknown ( σ 2 known) Jeffreys prior: p ( µ ) ∝ 1 (think of this as N (0 , ∞ ) ) Natural conjugate prior: N ( m, C ) Posterior N ( m ′ , C ′ ) with C ′ = [1 /C + nσ − 2 ] − 1 m ′ = C ′ [ m/C + nσ − 2 y ] σ 2 unknown ( µ known) Jeffreys prior: p ( σ 2 ) ∝ 1 /σ 2 (think of this as IG (0 , 0) ) Natural conjugate prior IG ( a, b ) a + n/ 2 , b + � n � i =1 ( y i − µ ) 2 / 2 � Posterior IG Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 14 / 32
JAGS JAGS Just another Gibbs sampler (JAGS) “is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS.” We will use JAGS through its R interface rjags . The basic workflow when using rjags is 1. Define model and priors in a string 2. Assign data 3. Run JAGS, i.e. simulate from the posterior 4. Summarize as necessary, e.g. mean, median, credible intervals, etc Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 15 / 32
JAGS Binomial model Binomial model Let Y ∼ Bin ( n, θ ) and θ ∼ Be (1 , 1) and we observe y = 3 successes out of n = 10 attempts. model = " model { y ~ dbin(theta,n) # notice p then n theta ~ dbeta(a,b) } " dat = list(n=10, y=3, a=1, b=1) m = jags.model(textConnection(model), dat) Compiling model graph Resolving undeclared variables Allocating nodes Graph information: Observed stochastic nodes: 1 Unobserved stochastic nodes: 1 Total graph size: 5 Initializing model r = coda.samples(m, "theta", n.iter=1000) Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 16 / 32
JAGS Binomial model Binomial model summary(r) Iterations = 1001:2000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 1000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE 0.324050 0.122776 0.003883 0.004139 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% 0.1110 0.2337 0.3103 0.4091 0.5760 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 17 / 32
Recommend
More recommend