empirical bayes
play

Empirical Bayes Newton Method Bayesian Linear Models MAP Learning - PowerPoint PPT Presentation

Empirical Bayes Will Penny Linear Models fMRI analysis Gradient Ascent Online learning Delta Rule Empirical Bayes Newton Method Bayesian Linear Models MAP Learning Will Penny MEG Source Reconstruction Empirical Bayes Model Evidence


  1. Empirical Bayes Will Penny Linear Models fMRI analysis Gradient Ascent Online learning Delta Rule Empirical Bayes Newton Method Bayesian Linear Models MAP Learning Will Penny MEG Source Reconstruction Empirical Bayes Model Evidence Isotropic Covariances 3rd March 2011 Linear Covariances Gradient Ascent MEG Source Reconstruction Restricted Maximum Likelihood Augmented Form ReML Objective Function References

  2. Empirical Bayes General Linear Model Will Penny Linear Models The General Linear Model (GLM) is given by fMRI analysis Gradient Ascent y = Xw + e Online learning Delta Rule Newton Method where y are data, X is a design matrix, and e are zero Bayesian Linear mean Gaussian errors with covariance V . The above Models MAP Learning equation implicitly defines the likelihood function MEG Source Reconstruction p ( y | w ) = N ( y ; Xw , V ) Empirical Bayes Model Evidence Isotropic Covariances Linear Covariances where the Normal density is given by Gradient Ascent MEG Source 1 � − 1 � Reconstruction 2 ( x − µ ) T C − 1 ( x − µ ) N ( x ; µ, C ) = ( 2 π ) N / 2 | C | 1 / 2 exp Restricted Maximum Likelihood Augmented Form ReML Objective Function References

  3. Empirical Bayes Maximum Likelihood Will Penny If we know V then we can estimate w by maximising the Linear Models likelihood or equivalently the log-likelihood fMRI analysis Gradient Ascent − N 2 log 2 π − 1 2 log | V | − 1 2 ( y − Xw ) T V − 1 ( y − Xw ) Online learning L = Delta Rule Newton Method We can compute the gradient with help from the Matrix Bayesian Linear Models Reference Manual MAP Learning MEG Source dL Reconstruction dw = X T V − 1 y − X T V − 1 Xw Empirical Bayes Model Evidence Isotropic Covariances to zero. This leads to the solution Linear Covariances Gradient Ascent MEG Source w ML = ( X T V − 1 X ) − 1 X T V − 1 y ˆ Reconstruction Restricted Maximum This is often referred to as Weighted Least Squares Likelihood Augmented Form (WLS), ˆ w ML = ˆ w WLS . For example, some observations ReML Objective Function may be more reliable than others (Penny et al, 2007). References

  4. Empirical Bayes fMRI analysis Will Penny For fMRI time series analysis we have a linear model at each voxel i Linear Models fMRI analysis y i = Xw i + e i Gradient Ascent V i = Cov ( e i ) is estimated first (see later) and then the Online learning Delta Rule regression coefficients are computed using Maximum Newton Method Likelihood (ML) estimation. Bayesian Linear Models w i = ( X T V − 1 X ) − 1 X T V − 1 ˆ y i MAP Learning i i MEG Source The fitted responses are then ˆ y i = X ˆ w i (SPM Manual) Reconstruction Empirical Bayes Model Evidence Isotropic Covariances Linear Covariances Gradient Ascent MEG Source Reconstruction Restricted Maximum Likelihood Augmented Form ReML Objective Function References

  5. Empirical Bayes fMRI analysis Will Penny The uncertainty in the ML estimates is given by Linear Models S = ( X T V − 1 X ) − 1 fMRI analysis i Gradient Ascent Contrast vectors c can then be used to test for specific Online learning Delta Rule effects Newton Method µ c = c T ˆ w i Bayesian Linear Models The uncertainty in the effect is then MAP Learning MEG Source Reconstruction σ 2 c = c T Sc Empirical Bayes Model Evidence and a t-score is then given by t = µ c /σ c Isotropic Covariances Linear Covariances Gradient Ascent MEG Source Reconstruction Restricted Maximum Likelihood Augmented Form ReML Objective Function References

  6. Empirical Bayes Least Squares Will Penny Linear Models fMRI analysis Gradient Ascent For isotropic error covariance V = λ I , the normal Online learning equations are Delta Rule Newton Method dL dw = λ X T y − λ X T Xw Bayesian Linear Models MAP Learning MEG Source Reconstruction Empirical Bayes Model Evidence Isotropic Covariances This leads to the Ordinary Least Squares (OLS) solution Linear Covariances Gradient Ascent w ML = ˆ ˆ w OLS , MEG Source w OLS = ( X T X ) − 1 X T y ˆ Reconstruction Restricted Maximum Likelihood Augmented Form ReML Objective Function References

  7. Empirical Bayes Gradient Ascent Will Penny In gradient ascent approaches an objective function, L , is Linear Models maximised by changing parameters w to follow the local fMRI analysis gradient Gradient Ascent τ dw dt = dL Online learning Delta Rule dw Newton Method where τ is the time constant that defines the learning Bayesian Linear Models rate. In discrete time, parameters are then updated as MAP Learning MEG Source Reconstruction w t = w t − 1 + 1 dL Empirical Bayes τ dw t − 1 Model Evidence Isotropic Covariances Linear Covariances Smaller time constants τ correspond to bigger updates at Gradient Ascent each step. That is, faster learning rates. In the batch MEG Source Reconstruction version of gradient ascent the gradient is computed Restricted Maximum based on all pattern pairs x n , y n for n = 1 .. N . In the Likelihood sequential version updates are based on gradients from Augmented Form ReML Objective Function individual patterns (see later). References

  8. Empirical Bayes Neural Implementations Will Penny Linear Models fMRI analysis Many ’neural implementations’ or neural network models Gradient Ascent Online learning are derived by taking a standard statistical model eg. Delta Rule linear models, hierarchical linear models, (non-)linear Newton Method dynamical systems, and then maximimising some cost Bayesian Linear Models function (eg the likelihood or posterior probability) using a MAP Learning sequential gradient ascent approach. MEG Source Reconstruction Empirical Bayes Model Evidence Isotropic Covariances Linear Covariances When the same model is applied to, for example, Gradient Ascent MEG Source neuroimaging data more sophisticated optimisation Reconstruction methods eg. Newton Methods (see later) are used. Restricted Maximum Likelihood Augmented Form ReML Objective Function References

  9. Empirical Bayes Online Learning - Sequential Gradient Ascent Will Penny In some situations observations may be made sequentially. For independent observations we have Linear Models fMRI analysis N Gradient Ascent � p ( y | w ) = p ( y n | w ) Online learning Delta Rule n = 1 Newton Method where Bayesian Linear Models N ( y n ; x n w , λ − 1 ) p ( y n | w ) = MAP Learning MEG Source 1 � − λ � Reconstruction 2 ( y n − x n w ) 2 = Z exp Empirical Bayes Model Evidence and x n is the n th row of X . Now take logs to give Isotropic Covariances Linear Covariances Gradient Ascent L n = log p ( y n | w ) MEG Source Reconstruction − λ 2 ( y n − x n w ) 2 − log Z = Restricted Maximum Likelihood Predictions with smaller error have higher likelihood. Augmented Form ReML Objective Function Online learning then proceeds by following the gradients References based on individual patterns.

  10. Empirical Bayes Online Learning Will Penny Linear Models fMRI analysis Gradient Ascent For the linear model the learning rule for the i th Online learning Delta Rule coefficient is Newton Method Bayesian Linear τ dw i dL n Models = dt dw i MAP Learning MEG Source = λ x n ( i )( y n − x n w ) Reconstruction Empirical Bayes Model Evidence Learning is faster for high precision observations, larger Isotropic Covariances inputs and bigger prediction errors. One can use this in Linear Covariances Gradient Ascent signal processing applications such as Real-Time fMRI. MEG Source Reconstruction Restricted Maximum Likelihood Augmented Form ReML Objective Function References

  11. Empirical Bayes Delta Rule Will Penny Linear Models fMRI analysis If λ is the same for all observations it can be absorbed Gradient Ascent into the learning rate. The above expression then Online learning reduces to the Delta Rule (Widrow and Hoff, 1960). Delta Rule Newton Method τ dw i Bayesian Linear dt = x n ( i )( y n − x n w ) Models MAP Learning MEG Source Reconstruction Empirical Bayes Model Evidence Isotropic Covariances If observations have different precisions then Linear Covariances Gradient Ascent MEG Source τ dw i Reconstruction dt = λ n x n ( i )( y n − x n w ) Restricted Maximum Likelihood Augmented Form ReML Objective Function References

  12. Empirical Bayes Example - Linear Regression Will Penny For the linear model Linear Models fMRI analysis Gradient Ascent Y = Xw + e Online learning Delta Rule with Cov ( e ) = λ − 1 I the log-likelihood is Newton Method Bayesian Linear − λ Models 2 ( y − Xw ) T ( y − Xw ) L ( w ) = MAP Learning MEG Source Reconstruction The gradient is Empirical Bayes Model Evidence Isotropic Covariances dL Linear Covariances j ( w ) = Gradient Ascent dw MEG Source λ X T y − λ X T Xw = Reconstruction Restricted λ X T ( y − Xw ) = Maximum Likelihood Augmented Form Following this gradient corresponds to the Delta rule. ReML Objective Function References

Recommend


More recommend