three versions of manifolds
play

Three versions of manifolds Sayan Mukherjee Departments of - PowerPoint PPT Presentation

Three versions of manifolds Sayan Mukherjee Departments of Statistical Science, Computer Science, Mathematics Institute for Genome Sciences & Policy, Duke University Joint work with: Part I F. Liang (UIUC), Q. Wu (MTSU), K. Mao (LYZ


  1. Three versions of manifolds Sayan Mukherjee Departments of Statistical Science, Computer Science, Mathematics Institute for Genome Sciences & Policy, Duke University Joint work with: Part I – F. Liang (UIUC), Q. Wu (MTSU), K. Mao (LYZ Capital) Part II – J. Guinney (Fred Hutchinson CC), Q. Wu (MTSU), D.X. Zhou (City University Hong Kong), M. Maggioni (Duke University) Part III – P.R. Hahn (University of Chicago) October 3, 2011

  2. Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements A play in three acts (1) Bayesian model for supervised dimension reduction. 2,

  3. Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements A play in three acts (1) Bayesian model for supervised dimension reduction. (2) Geometric analysis for SDR based on gradients. 3,

  4. Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements A play in three acts (1) Bayesian model for supervised dimension reduction. (2) Geometric analysis for SDR based on gradients. (3) Generative model for manifold learning using Lie groups. 4,

  5. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Information and sufficiency A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher and goes back to at least Adcock 1878 and Edgeworth 1884. 5,

  6. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Information and sufficiency A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher and goes back to at least Adcock 1878 and Edgeworth 1884. X 1 , ..., X n drawn iid form a Gaussian can be reduced to µ, σ 2 . 6,

  7. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Regression Assume the model Y = f ( X ) + ε, I E ε = 0 , with X ∈ X ⊂ R p and Y ∈ R . 7,

  8. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Regression Assume the model Y = f ( X ) + ε, I E ε = 0 , with X ∈ X ⊂ R p and Y ∈ R . iid Data – D = { ( x i , y i ) } n ∼ ρ ( X , Y ). i =1 8,

  9. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Dimension reduction R p replace X with If the data lives in a p-dimensional space X ∈ I R d , p ≫ d . Θ( X ) ∈ I 9,

  10. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Dimension reduction R p replace X with If the data lives in a p-dimensional space X ∈ I R d , p ≫ d . Θ( X ) ∈ I My belief: physical, biological and social systems are inherently low dimensional and variation of interest in these systems can be captured by a low-dimensional submanifold. 10,

  11. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Dimension reduction R p replace X with If the data lives in a p-dimensional space X ∈ I R d , p ≫ d . Θ( X ) ∈ I My belief: physical, biological and social systems are inherently low dimensional and variation of interest in these systems can be captured by a low-dimensional submanifold. R p of dimension d ≪ p ρ X is concentrated on a manifold M ⊂ I 11,

  12. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Supervised dimension reduction (SDR) Given response variables Y 1 , ..., Y n ∈ I R and explanatory variables or covariates X 1 , ..., X n ∈ X ⊂ R p iid ∼ N(0 , σ 2 ) . Y i = f ( X i ) + ε i , ε i 12,

  13. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Supervised dimension reduction (SDR) Given response variables Y 1 , ..., Y n ∈ I R and explanatory variables or covariates X 1 , ..., X n ∈ X ⊂ R p iid ∼ N(0 , σ 2 ) . Y i = f ( X i ) + ε i , ε i Is there a submanifold S ≡ S Y | X such that Y ⊥ ⊥ X | P S ( X ) ? 13,

  14. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Linear projections capture nonlinear manifolds In this talk P S ( X ) = B T X where B = ( b 1 , ..., b d ). 14,

  15. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Linear projections capture nonlinear manifolds In this talk P S ( X ) = B T X where B = ( b 1 , ..., b d ). Semiparametric model Y i = f ( X i ) + ε i = g ( b T 1 X i , . . . , b T d X i ) + ε i , span B is the dimension reduction (d.r.) subspace. 15,

  16. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Show video 16,

  17. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Visualization of SDR (a) Data (b) Diffusion map 20 1 0.8 10 0.5 0.5 Dimension 2 0.6 0 z 0 0 0.4 −10 −0.5 −0.5 0.2 −20 100 50 0 0 0 20 −20 0 0.5 1 y x Dimension 1 (c) GOP (d) GDM 20 1 0.8 10 0.5 0.5 Dimension 2 Dimension 2 0.6 0 0 0 0.4 −10 −0.5 −0.5 0.2 −20 0 −10 0 10 20 0 0.5 1 Dimension 1 Dimension 1 17,

  18. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Principal components analysis (PCA) Algorithmic view of PCA: 1 Given X = ( X 1 , ...., X n ) a p × n matrix construct Σ = ( X − ¯ ˆ X )( X − ¯ X ) T 18,

  19. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Principal components analysis (PCA) Algorithmic view of PCA: 1 Given X = ( X 1 , ...., X n ) a p × n matrix construct Σ = ( X − ¯ ˆ X )( X − ¯ X ) T 2 Eigen-decomposition of ˆ Σ λ i v i = ˆ Σ v i . 19,

  20. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Probabilistic PCA R p is charterized by a multivariate normal X ∈ I X ∼ N( µ + A ν, ∆) , ν ∼ N(0 , I d ) R p µ ∈ I R p × d A ∈ I R p × p ∆ ∈ I R d . ν ∈ I 20,

  21. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Probabilistic PCA R p is charterized by a multivariate normal X ∈ I X ∼ N( µ + A ν, ∆) , ν ∼ N(0 , I d ) R p µ ∈ I R p × d A ∈ I R p × p ∆ ∈ I R d . ν ∈ I ν is a latent variable 21,

  22. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges SDR model Semiparametric model Y i = f ( X i ) + ε i = g ( b T 1 X i , . . . , b T d X i ) + ε i , span B is the dimension reduction (d.r.) subspace. 22,

  23. Introduction Supervised dimension reduction Unsupervised dimension reduction Geometric analysis for SDR Likelihood based SDR Generative model for manifold learning Results on data Acknowledgements Challenges Principal fitted components (PFC) Define X y ≡ ( X | Y = y ) and specify multivariate normal distribution X y ∼ N( µ y , ∆) , µ y = µ + A ν y R p µ ∈ I R p × d A ∈ I R d . ν y ∈ I 23,

Recommend


More recommend