latent variable models for gwas
play

Latent Variable models for GWAs Oliver Stegle Machine Learning and - PowerPoint PPT Presentation

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes T ubingen, Germany September 2011 O. Stegle Latent variable models for GWAs T ubingen 1 Motivation Why


  1. Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes T¨ ubingen, Germany September 2011 O. Stegle Latent variable models for GWAs T¨ ubingen 1

  2. Motivation Why latent variables ? Causal influences on phenotypes ◮ Genotype ◮ Primary variable of SNPs interest ATGACCTG A AACTGGGGGA C TGACGTG G AACGGT individuals ATGACCTG C AACTGGGGGA C TGACGTG C AACGGT Genome ATGACCTG C AACTGGGGGA C TGACGTG C AACGGT ◮ Known confounding ATGACCTG A AACTGGGGGA T TGACGTG G AACGGT ATGACCTG C AACTGGGGGA T TGACGTG C AACGGT ATGACCTG C AACTGGGGGA T TGACGTG C AACGGT factors ◮ Covariates ? ◮ Population structure ... y y y y y 1 y 2 y y N y y y y y y y y y y ◮ Unknown (latent) Phenome phenotypes confounders ◮ Sample handling ◮ Sample history ◮ Subtle environmental perturbations O. Stegle Latent variable models for GWAs T¨ ubingen 2

  3. Motivation Why latent variables ? Causal influences on phenotypes ◮ Genotype ◮ Primary variable of interest SNPs Covariates ATGACCTG A AACTGGGGGA C TGACGTG G AACGGT individuals Population ATGACCTG C AACTGGGGGA C TGACGTG C AACGGT ◮ Known confounding Genome ATGACCTG C AACTGGGGGA C TGACGTG C AACGGT y y y y ATGACCTG A AACTGGGGGA T TGACGTG G AACGGT y factors ATGACCTG C AACTGGGGGA T TGACGTG C AACGGT ATGACCTG C AACTGGGGGA T TGACGTG C AACGGT ◮ Covariates ? ◮ Population structure ... ◮ Unknown (latent) y y 1 y y y y y 2 y y N y y y y y y y y y Phenome confounders phenotypes ◮ Sample handling ◮ Sample history ◮ Subtle environmental perturbations O. Stegle Latent variable models for GWAs T¨ ubingen 2

  4. Motivation Why latent variables ? Causal influences on phenotypes ◮ Genotype ◮ Primary variable of interest SNPs Covariates Confounders ATGACCTG A AACTGGGGGA C TGACGTG G AACGGT individuals Population ATGACCTG C AACTGGGGGA C TGACGTG C AACGGT ◮ Known confounding Genome ATGACCTG C AACTGGGGGA C TGACGTG C AACGGT y y y y y y y ATGACCTG A AACTGGGGGA T TGACGTG G AACGGT y y y factors ATGACCTG C AACTGGGGGA T TGACGTG C AACGGT ATGACCTG C AACTGGGGGA T TGACGTG C AACGGT ◮ Covariates ? ◮ Population structure ... ◮ Unknown (latent) y y 1 y y y y y 2 y y N y y y y y y y y y Phenome confounders phenotypes ◮ Sample handling ◮ Sample history ◮ Subtle environmental perturbations O. Stegle Latent variable models for GWAs T¨ ubingen 2

  5. Outline Outline O. Stegle Latent variable models for GWAs T¨ ubingen 3

  6. Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Outline Motivation Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Modeling hidden confounders in GWAs Model Applications Modeling unobserved cellular phenotypes in genetic analyses Model Applications A unifying view Summary O. Stegle Latent variable models for GWAs T¨ ubingen 4

  7. Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Manifolds and dimension reduction (from Olivier Grisel, Generated using the Modular Data Processing toolkit and matplotlib.) O. Stegle Latent variable models for GWAs T¨ ubingen 5

  8. Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction ◮ Map G dimensional data on K dimensional manifold; K << G Y = H W + Ψ ���� ���� ���� ���� NxG NxK KxG NxG ◮ H : latent factors in low-dimensional space ◮ W : weights for factors on data dimensions ◮ Ψ : noise, ψ n,g ∼ N (0 , σ 2 ) . ◮ Challenge: neither W nor H known! ◮ Depending on assumptions on W and H : ◮ Principle component analysis (PCA) ◮ Independent component analysis (ICA) ◮ ... O. Stegle Latent variable models for GWAs T¨ ubingen 6

  9. Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction ◮ Map G dimensional data on K dimensional manifold; K << G Y = H W + Ψ ���� ���� ���� ���� NxG NxK KxG NxG ◮ H : latent factors in low-dimensional space ◮ W : weights for factors on data dimensions ◮ Ψ : noise, ψ n,g ∼ N (0 , σ 2 ) . ◮ Challenge: neither W nor H known! ◮ Depending on assumptions on W and H : ◮ Principle component analysis (PCA) ◮ Independent component analysis (ICA) ◮ ... O. Stegle Latent variable models for GWAs T¨ ubingen 6

  10. Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction ◮ Map G dimensional data on K dimensional manifold; K << G Y = H W + Ψ ���� ���� ���� ���� NxG NxK KxG NxG ◮ H : latent factors in low-dimensional space ◮ W : weights for factors on data dimensions ◮ Ψ : noise, ψ n,g ∼ N (0 , σ 2 ) . ◮ Challenge: neither W nor H known! ◮ Depending on assumptions on W and H : ◮ Principle component analysis (PCA) ◮ Independent component analysis (ICA) ◮ ... O. Stegle Latent variable models for GWAs T¨ ubingen 6

  11. Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction PCA PCA is corresponds to a noise-free version of the model = H Y W ���� ���� ���� NxG NxK KxG ◮ PCA components ( H ) correspond to directions of maximum data variance in the original dataset: ◮ Covariance matrix: C = YY T ◮ Eigenvalue/Eigen vectors Cv i = λ i v i ◮ Projection matrix P = [ v 1 , . . . , v K ] ◮ Principle components H n = P · Y n . O. Stegle Latent variable models for GWAs T¨ ubingen 7

  12. Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction PCA PCA is corresponds to a noise-free version of the model = H Y W ���� ���� ���� NxG NxK KxG ◮ PCA components ( H ) correspond to directions of maximum data variance in the original dataset: ◮ Covariance matrix: C = YY T ◮ Eigenvalue/Eigen vectors Cv i = λ i v i ◮ Projection matrix P = [ v 1 , . . . , v K ] ◮ Principle components H n = P · Y n . O. Stegle Latent variable models for GWAs T¨ ubingen 7

  13. Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction PCA PCA is corresponds to a noise-free version of the model = H Y W ���� ���� ���� NxG NxK KxG ◮ PCA components ( H ) correspond to directions of maximum data variance in the original dataset: ◮ Covariance matrix: C = YY T ◮ Eigenvalue/Eigen vectors Cv i = λ i v i ◮ Projection matrix P = [ v 1 , . . . , v K ] ◮ Principle components H n = P · Y n . O. Stegle Latent variable models for GWAs T¨ ubingen 7

  14. Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction PCA PCA is corresponds to a noise-free version of the model = H Y W ���� ���� ���� NxG NxK KxG ◮ PCA components ( H ) correspond to directions of maximum data variance in the original dataset: ◮ Covariance matrix: C = YY T ◮ Eigenvalue/Eigen vectors Cv i = λ i v i ◮ Projection matrix P = [ v 1 , . . . , v K ] ◮ Principle components H n = P · Y n . O. Stegle Latent variable models for GWAs T¨ ubingen 7

  15. Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction PCA PCA is corresponds to a noise-free version of the model = H Y W ���� ���� ���� NxG NxK KxG ◮ PCA components ( H ) correspond to directions of maximum data variance in the original dataset: ◮ Covariance matrix: C = YY T ◮ Eigenvalue/Eigen vectors Cv i = λ i v i ◮ Projection matrix P = [ v 1 , . . . , v K ] ◮ Principle components H n = P · Y n . O. Stegle Latent variable models for GWAs T¨ ubingen 7

  16. Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction Bayesian PCA and GPLVM Assumption: data dimensions or sample dimension independent given H and W . GPLVM Probabilistic PCA G � � � � � Hw g , σ 2 I � N p ( Y | H , W ) = N y : ,g � � � � � h n W , σ 2 I p ( Y | H , W ) = N � y n g =1 n =1 G � � � � � 0 , σ 2 N � p ( W ) = N � � � w : ,g h I � � 0 , σ 2 N � p ( H ) = h n h I g =1 n =1  �  � N G � � � � � � h WW T + σ 2 I  � 0 , σ 2 h HH T + σ 2 I  � 0 , σ 2 p ( Y | W ) = N � p ( Y | H ) = N y n  y : ,g  �  �  � n =1 g =1 � �� � � NxN [Tipping and Bishop, 1999] [Lawrence, 2005] O. Stegle Latent variable models for GWAs T¨ ubingen 8

  17. Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction Bayesian PCA and GPLVM Assumption: data dimensions or sample dimension independent given H and W . GPLVM Probabilistic PCA G � � � � � Hw g , σ 2 I � N p ( Y | H , W ) = N y : ,g � � � � � h n W , σ 2 I p ( Y | H , W ) = N � y n g =1 n =1 G � � � � � 0 , σ 2 N � p ( W ) = N � � � w : ,g h I � � 0 , σ 2 N � p ( H ) = h n h I g =1 n =1  �  � N G � � � � � � h WW T + σ 2 I  � 0 , σ 2 h HH T + σ 2 I  � 0 , σ 2 p ( Y | W ) = N � p ( Y | H ) = N y n  y : ,g  �  �  � n =1 g =1 � �� � � NxN [Tipping and Bishop, 1999] [Lawrence, 2005] O. Stegle Latent variable models for GWAs T¨ ubingen 8

Recommend


More recommend