Nonlinear models Will Penny Nonlinear Regression Nonlinear Regression Priors Energies Nonlinear models Posterior Gradient Ascent Adaptive Step Size Approach to Limit Example Will Penny Priors Posterior Oscillator Example Sampling Metropolis-Hasting Bayesian Inference Course, Proposal density WTCN, UCL, March 2013 References
Nonlinear models Nonlinear Regression Will Penny We consider the framework implemented in the SPM Nonlinear function spm-nlsi-GN.m . It implements Bayesian Regression Nonlinear Regression estimation of nonlinear models of the form Priors Energies Posterior y = g ( w ) + e Gradient Ascent Adaptive Step Size Approach to Limit where g ( w ) is some nonlinear function of parameters w , Example Priors and e is zero mean additive Gaussian noise with Posterior covariance C y . The likelihood of the data is therefore Oscillator Example Sampling Metropolis-Hasting p ( y | w , λ ) = N ( y ; g ( w ) , C y ) Proposal density References The error precision matrix is assumed to decompose linearly � C − 1 = exp ( λ i ) Q i y i where Q i are known precision basis functions and λ are hyperparameters eg Q = I , noise precision s = exp ( λ ) .
Nonlinear models Priors Will Penny Nonlinear Regression We allow Gaussian priors over model parameters Nonlinear Regression Priors Energies p ( w ) = N ( w ; µ w , C w ) Posterior Gradient Ascent Adaptive Step Size where the prior mean and covariance are assumed Approach to Limit Example known. Priors Posterior Oscillator Example Sampling Metropolis-Hasting The hyperparameters are constrained by the prior Proposal density References p ( λ ) = N ( λ ; µ λ , C λ ) This is not Empirical Bayes.
Nonlinear models Generative Model Will Penny Nonlinear Regression Nonlinear Regression Priors VL Generative Model Energies Posterior Gradient Ascent Adaptive Step Size Approach to Limit Example Priors Posterior Oscillator Example Sampling Metropolis-Hasting Proposal density References p ( y , w , λ ) = p ( y | w , λ ) p ( w ) p ( λ )
Nonlinear models Energies Will Penny Nonlinear Regression The above distributions allow one to write down an Nonlinear Regression Priors expression for the joint log likelihood of the data, Energies Posterior parameters and hyperparameters Gradient Ascent Adaptive Step Size Approach to Limit L ( w , λ ) = log [ p ( y | w , λ ) p ( w ) p ( λ )] Example Priors Posterior Oscillator Example It splits into three terms Sampling Metropolis-Hasting Proposal density L ( w , λ ) = log p ( y | w , λ ) References + log p ( w ) + log p ( λ )
Nonlinear models Joint Log Likelihood Will Penny The joint log likelihood is composed of sum squared Nonlinear precision weighted prediction errors and entropy terms Regression Nonlinear Regression Priors − 1 y e y − 1 2 log | C y | − N y 2 e T y C − 1 Energies L ( w , λ ) = 2 log 2 π Posterior Gradient Ascent Adaptive Step Size 1 w e w − 1 2 log | C w | − N w 2 e T w C − 1 − 2 log 2 π Approach to Limit Example Priors 1 λ e λ − 1 2 log | C λ | − N λ 2 e T λ C − 1 Posterior − 2 log 2 π Oscillator Example Sampling Metropolis-Hasting Proposal density where prediction errors are the difference between what References is expected and what is observed e y = y − g ( m w ) e w = m w − µ w e λ = m λ − µ λ
Nonlinear models VL Posteriors Will Penny Nonlinear Regression Nonlinear Regression Priors Energies The Variational Laplace (VL) algorithm, implemented in Posterior Gradient Ascent spm-nlsi-GN.m , assumes an approximate posterior Adaptive Step Size density of the following factorised form Approach to Limit Example Priors q ( w , λ | y ) = q ( w | y ) q ( λ | y ) Posterior Oscillator Example q ( w | y ) = N ( w ; m w , S w ) Sampling Metropolis-Hasting q ( λ | y ) = N ( λ ; m λ , S λ ) Proposal density References This is a fixed-form variational method.
Nonlinear models Variational Energies Will Penny Nonlinear Regression Nonlinear Regression Priors The approximate posteriors are estimated by minimising Energies Posterior the Kullback-Liebler (KL) divergence between the true Gradient Ascent Adaptive Step Size posterior and these approximate posteriors. This is Approach to Limit implemented by maximising the following (negative) Example Priors variational energies Posterior Oscillator Example � Sampling I w = L ( w , λ ) q ( λ ) d λ Metropolis-Hasting Proposal density � References I λ = L ( w , λ ) q ( w ) dw
Nonlinear models Gradient Ascent Will Penny This maximisation is effected by first computing the Nonlinear Regression gradient and curvature of the variational energies at the Nonlinear Regression Priors current parameter estimate, m w ( old ) . For example, for Energies Posterior the parameters we have Gradient Ascent Adaptive Step Size Approach to Limit dI w Example j w ( i ) = dm w ( i ) Priors Posterior d 2 I w Oscillator Example H w ( i , j ) = Sampling dm w ( i ) dm w ( j ) Metropolis-Hasting Proposal density where i and j index the i th and j th parameters, j w is the References gradient vector and H w is the curvature matrix. The estimate for the posterior mean is then given by m w ( new ) = m w ( old ) + ∆ m w
Nonlinear models Adaptive Step Size Will Penny Nonlinear Regression Nonlinear Regression Priors The change is given by Energies Posterior Gradient Ascent ∆ m w = − H − 1 Adaptive Step Size w j w Approach to Limit Example which is equivalent to a Newton update (Press et al. Priors Posterior 2007). Oscillator Example Sampling This implements a step in the direction of the gradient Metropolis-Hasting Proposal density with a step size given by the inverse curvature. Big steps References are taken in regions where the gradient changes slowly (low curvature).
Nonlinear models Approach to Limit Example Will Penny y ( t ) = − 60 + V a [ 1 − exp ( − t /τ )] + e ( t ) Nonlinear Regression Nonlinear Regression Priors Energies Posterior Gradient Ascent Adaptive Step Size Approach to Limit Example Priors Posterior Oscillator Example Sampling Metropolis-Hasting Proposal density References V a = 30 , τ = 8 Noise precision s = exp ( λ ) = 1
Nonlinear models Prior Landscape Will Penny A plot of log p ( w ) where w = [ log τ, log V a ] Nonlinear Regression Nonlinear Regression Priors Energies Posterior Gradient Ascent Adaptive Step Size Approach to Limit Example Priors Posterior Oscillator Example Sampling Metropolis-Hasting Proposal density References µ w = [ 3 , 1 . 6 ] T , C w = diag ([ 1 / 16 , 1 / 16 ]);
Nonlinear models Samples from Prior Will Penny The true model parameters are unlikely apriori Nonlinear Regression Nonlinear Regression V a = 30 , τ = 8 Priors Energies Posterior Gradient Ascent Adaptive Step Size Approach to Limit Example Priors Posterior Oscillator Example Sampling Metropolis-Hasting Proposal density References
Nonlinear models Prior Noise Precision Will Penny Q = I . Noise precision s = exp ( λ ) with Nonlinear Regression p ( λ ) = N ( λ ; µ λ , C λ ) Nonlinear Regression Priors Energies Posterior Gradient Ascent Adaptive Step Size Approach to Limit Example Priors Posterior Oscillator Example Sampling Metropolis-Hasting Proposal density References with µ λ = 0. We used C λ = 1 / 16 (left) and C λ = 1 / 4 (right). True noise precision, s = 1.
Nonlinear models Posterior Landscape Will Penny Nonlinear Regression A plot of log [ p ( y | w ) p ( w )] Nonlinear Regression Priors Energies Posterior Gradient Ascent Adaptive Step Size Approach to Limit Example Priors Posterior Oscillator Example Sampling Metropolis-Hasting Proposal density References
Nonlinear models VL optimisation Will Penny Nonlinear Path of 6 VL iterations (x marks start) Regression Nonlinear Regression Priors Energies Posterior Gradient Ascent Adaptive Step Size Approach to Limit Example Priors Posterior Oscillator Example Sampling Metropolis-Hasting Proposal density References Investigate further using matlab/lif .
Nonlinear models Oscillator Example Will Penny Nonlinear Regression Nonlinear Regression This example is based on a differential equation Priors Energies describing the evolution of a voltage variable, v , and a Posterior Gradient Ascent recovery variable, r Adaptive Step Size Approach to Limit Example c [ v − 1 3 v 3 + r + I ] Priors ˙ = v Posterior Oscillator Example − 1 ˙ r = c [ v − a + br ] Sampling Metropolis-Hasting Proposal density This is used in statistics as an example of a difficult References optimisation algorithm with multiple local maxima Ramsay et al. (2007).
Nonlinear models Oscillator Example Will Penny Nonlinear Regression For a = 0 . 2, b = 0 . 2, c = 3 and I = 0 Nonlinear Regression Priors Energies Posterior Gradient Ascent Adaptive Step Size Approach to Limit Example Priors Posterior Oscillator Example Sampling Metropolis-Hasting Proposal density References
Nonlinear models Oscillator Example Will Penny A plot of log [ p ( y | w ) p ( w )] Nonlinear Regression Nonlinear Regression Priors Energies Posterior Gradient Ascent Adaptive Step Size Approach to Limit Example Priors Posterior Oscillator Example Sampling Metropolis-Hasting Proposal density References Parameters w = [ a , b ] . Fix I = 0, c = 3.
Recommend
More recommend