gwas iv bayesian linear variance component models
play

GWAS IV: Bayesian linear (variance component) models Dr. Oliver - PowerPoint PPT Presentation

GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes T ubingen, Germany T ubingen Summer 2011 Oliver Stegle GWAS IV: Bayesian linear models Summer


  1. GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes T¨ ubingen, Germany T¨ ubingen Summer 2011 Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 1

  2. Motivation Regression Lineare regression: ◮ Making predictions ◮ Comparison of alternative models Y Bayesian and regularized regression: ◮ Uncertainty in model parameters x* X ◮ Generalized basis functions Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 2

  3. Motivation Regression Lineare regression: ◮ Making predictions ◮ Comparison of alternative models Y Bayesian and regularized regression: ◮ Uncertainty in model parameters x* X ◮ Generalized basis functions Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 2

  4. Motivation Regression Lineare regression: ◮ Making predictions ◮ Comparison of alternative models Y Bayesian and regularized regression: ◮ Uncertainty in model parameters x* X ◮ Generalized basis functions Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 2

  5. Motivation Further reading, useful material ◮ Christopher M. Bishop: Pattern Recognition and Machine learning [Bishop, 2006] ◮ Sam Roweis: Gaussian identities [Roweis, 1999] Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 3

  6. Outline Outline Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 4

  7. Linear Regression II Outline Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 5

  8. Linear Regression II Regression Noise model and likelihood ◮ Given a dataset D = { x n , y n } N n =1 , where x n = { x n, 1 , . . . , x n,S } is S dimensional (for example S SNPs), fit parameters θ of a regressor f with added Gaussian noise: � � � 0 , σ 2 � p ( ǫ | σ 2 ) = N y n = f ( x n ; θ ) + ǫ n where ǫ . ◮ Equivalent likelihood formulation: N � � � � f ( x n ) , σ 2 � p ( y | X ) = N y n n =1 Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 6

  9. Linear Regression II Regression Choosing a regressor ◮ Choose f to be linear: N � � � � x n · θ + c, σ 2 � p ( y | X ) = N y n n =1 ◮ Consider bias free case, c = 0 , otherwise inlcude an additional column of ones in each x n . Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 7

  10. Linear Regression II Regression Choosing a regressor ◮ Choose f to be linear: N � � � � x n · θ + c, σ 2 � p ( y | X ) = N y n n =1 ◮ Consider bias free case, c = 0 , otherwise inlcude an additional column of ones in each x n . Equivalent graphical model Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 7

  11. Linear Regression II Linear Regression Maximum likelihood ◮ Taking the logarithm, we obtain N � � � � x n · θ , σ 2 � ln p ( y | θ , X , σ 2 ) = ln N y n n =1 N 1 = − N � 2 ln 2 πσ 2 − ( y n − x n · θ ) 2 2 σ 2 n =1 � �� � Sum of squares ◮ The likelihood is maximized when the squared error is minimized. ◮ Least squares and maximum likelihood are equivalent. Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 8

  12. Linear Regression II Linear Regression Maximum likelihood ◮ Taking the logarithm, we obtain N � � � � x n · θ , σ 2 � ln p ( y | θ , X , σ 2 ) = ln N y n n =1 N 1 = − N � 2 ln 2 πσ 2 − ( y n − x n · θ ) 2 2 σ 2 n =1 � �� � Sum of squares ◮ The likelihood is maximized when the squared error is minimized. ◮ Least squares and maximum likelihood are equivalent. Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 8

  13. Linear Regression II Linear Regression Maximum likelihood ◮ Taking the logarithm, we obtain N � � � � x n · θ , σ 2 � ln p ( y | θ , X , σ 2 ) = ln N y n n =1 N 1 = − N � 2 ln 2 πσ 2 − ( y n − x n · θ ) 2 2 σ 2 n =1 � �� � Sum of squares ◮ The likelihood is maximized when the squared error is minimized. ◮ Least squares and maximum likelihood are equivalent. Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 8

  14. Linear Regression II Linear Regression and Least Squares y n y f ( x n , w ) x x n (C.M. Bishop, Pattern Recognition and Machine Learning) N E ( θ ) = 1 � ( y n − x n · θ ) 2 2 n =1 Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 9

  15. Linear Regression II Linear Regression and Least Squares ◮ Derivative w.r.t a single weight entry θ i � � N − 1 d d � ln p ( y | θ , σ 2 ) = ( y n − x n · θ ) 2 2 σ 2 d θ i d θ i n =1 N = 1 � ( y n − x n · θ ) x i σ 2 n =1 ◮ Set gradient w.r.t to θ to zero N ∇ θ ln p ( y | θ , σ 2 ) = 1 � ( y n − x n · θ ) x T n = 0 σ 2 n =1 ⇒ θ ML = ( X T X ) − 1 X T = y � �� � Pseudo inverse   x 1 , 1 . . . x 1 , D ◮ Here, the matrix X is defined as X = . . . . . . . . .   x N, 1 . . . x N,D Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 10

  16. Linear Regression II Linear Regression and Least Squares ◮ Derivative w.r.t a single weight entry θ i � � N − 1 d d � ln p ( y | θ , σ 2 ) = ( y n − x n · θ ) 2 2 σ 2 d θ i d θ i n =1 N = 1 � ( y n − x n · θ ) x i σ 2 n =1 ◮ Set gradient w.r.t to θ to zero N ∇ θ ln p ( y | θ , σ 2 ) = 1 � ( y n − x n · θ ) x T n = 0 σ 2 n =1 ⇒ θ ML = ( X T X ) − 1 X T = y � �� � Pseudo inverse   x 1 , 1 . . . x 1 , D ◮ Here, the matrix X is defined as X = . . . . . . . . .   x N, 1 . . . x N,D Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 10

  17. Linear Regression II Linear Regression and Least Squares ◮ Derivative w.r.t a single weight entry θ i � � N − 1 d d � ln p ( y | θ , σ 2 ) = ( y n − x n · θ ) 2 2 σ 2 d θ i d θ i n =1 N = 1 � ( y n − x n · θ ) x i σ 2 n =1 ◮ Set gradient w.r.t to θ to zero N ∇ θ ln p ( y | θ , σ 2 ) = 1 � ( y n − x n · θ ) x T n = 0 σ 2 n =1 ⇒ θ ML = ( X T X ) − 1 X T = y � �� � Pseudo inverse   x 1 , 1 . . . x 1 , D ◮ Here, the matrix X is defined as X = . . . . . . . . .   x N, 1 . . . x N,D Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 10

  18. Linear Regression II Polynomial Curve Fitting Motivation ◮ Non-linear relationships. Y ◮ Multiple SNPs playing a role for a particular phenotype. x* X Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 11

  19. Linear Regression II Polynomial Curve Fitting Univariate input x ◮ Use the polynomials up to degree K to construct new features from x f ( x, θ ) = θ 0 + θ 1 x + θ 2 x 2 + · · · + θ K x K K � θ k φ k ( x ) = θ T φ ( x ) = k =1 where we defined φ ( x ) = (1 , x, x 2 , . . . , x K ) . ◮ φ can be any feature mapping. ◮ Possible to show: the feature map φ can be expressed in terms of kernels (kernel trick). Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 12

  20. Linear Regression II Polynomial Curve Fitting Univariate input x ◮ Use the polynomials up to degree K to construct new features from x f ( x, θ ) = θ 0 + θ 1 x + θ 2 x 2 + · · · + θ K x K K � θ k φ k ( x ) = θ T φ ( x ) = k =1 where we defined φ ( x ) = (1 , x, x 2 , . . . , x K ) . ◮ φ can be any feature mapping. ◮ Possible to show: the feature map φ can be expressed in terms of kernels (kernel trick). Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 12

  21. Linear Regression II Polynomial Curve Fitting Overfitting ◮ The degree of the polynomial is crucial to avoid under- and overfitting. M = 0 1 t 0 −1 0 1 x (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 13

  22. Linear Regression II Polynomial Curve Fitting Overfitting ◮ The degree of the polynomial is crucial to avoid under- and overfitting. M = 1 1 t 0 −1 0 1 x (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 13

  23. Linear Regression II Polynomial Curve Fitting Overfitting ◮ The degree of the polynomial is crucial to avoid under- and overfitting. M = 3 1 t 0 −1 0 1 x (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 13

  24. Linear Regression II Polynomial Curve Fitting Overfitting ◮ The degree of the polynomial is crucial to avoid under- and overfitting. M = 9 1 t 0 −1 0 1 x (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 13

  25. Linear Regression II Multivariate regression Polynomial curve fitting Multivariate regression (SNPs) f ( x, θ ) = θ 0 + θ 1 x + · · · + θ K x K S � f ( x, θ ) = θ s x s K � s =1 = θ k φ k ( x ) = x · θ k =1 = φ ( x ) · θ , ◮ Note: When fitting a single binary SNP genotype x i , a linear model is most general! Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 14

  26. Linear Regression II Multivariate regression Polynomial curve fitting Multivariate regression (SNPs) f ( x, θ ) = θ 0 + θ 1 x + · · · + θ K x K S � f ( x, θ ) = θ s x s K � s =1 = θ k φ k ( x ) = x · θ k =1 = φ ( x ) · θ , ◮ Note: When fitting a single binary SNP genotype x i , a linear model is most general! Oliver Stegle GWAS IV: Bayesian linear models Summer 2011 14

Recommend


More recommend