transferring diffusion based manifold
play

TRANSFERRING DIFFUSION BASED MANIFOLD C Diffusion Maps Embryoid - PowerPoint PPT Presentation

Myeloid Cells Endothelial-Myeloid Progenitors PHATE 2 Genes tem Cell Endothelium Blastocysts Muscle Precursors Vascular muscle cells Cells PHATE 1 B Diffusion Maps Articifical Tree tSNE PHATE PCA TRANSFERRING DIFFUSION BASED


  1. Myeloid Cells Endothelial-Myeloid Progenitors PHATE 2 Genes tem Cell Endothelium Blastocysts Muscle Precursors Vascular muscle cells Cells PHATE 1 B Diffusion Maps Articifical Tree tSNE PHATE PCA TRANSFERRING DIFFUSION BASED MANIFOLD C Diffusion Maps Embryoid Bodies tSNE PHATE PCA LEARNING TO TRAJECTORIES AND TIME VARYING DATA Matthew Hirn, Michigan State University Jointly with Daniel Burkhardt (Yale), William Chen (Yale), Ronald Coifman (Yale), Natalia Ivanova (Yale), Smita Krishnaswamy (Yale), Nicholas Marshall (Yale), Kevin Moon (Yale), Antonia van den Elzen (Yale), David van Dijk (Sloan-Kettering), Zheng Wang (Yale), Guy D MARS-seq Wolf (Yale) CyTOF iPSC Facebook Hi-C Gut Microbiome Bone Marrow Prevotella Firmicutes

  2. MANIFOLD LEARNING ! R d sampled iid from some distribution • Data X = { x 1 , . . . , x n } ⇢ ( M , g ) , • Extrinsic dimension d possibly large: dim( M ) ⌧ d • How do we obtain new coordinates X 7! Y = { y 1 , . . . , y n } ⇢ R k with k ⇠ dim( M ) that preserve the underlying local geometry? • Cam we simultaneously emphasize clusters within the data? n

  3. MANIFOLD LEARNING ! R d sampled iid from some distribution • Data X = { x 1 , . . . , x n } ⇢ ( M , g ) , • Extrinsic dimension d possibly large: dim( M ) ⌧ d • How do we obtain new coordinates X 7! Y = { y 1 , . . . , y n } ⇢ R k with k ⇠ dim( M ) that preserve the underlying local geometry? • Cam we simultaneously emphasize clusters within the data? n

  4. MANIFOLD LEARNING ! R d sampled iid from some distribution • Data X = { x 1 , . . . , x n } ⇢ ( M , g ) , • Extrinsic dimension d possibly large: dim( M ) ⌧ d • How do we obtain new coordinates X 7! Y = { y 1 , . . . , y n } ⇢ R k with k ⇠ dim( M ) that preserve the underlying local geometry? • Cam we simultaneously emphasize clusters within the data? n First coordinate of embedding

  5. Coifman, Lafon 2006 DIFFUSION MAPS Nadler, Lafon, Coifman, Kevrekidis, 2006 • Local similarity kernel: K ij = k ( x i , x j ) = e ≠Î x i ≠ x j Î 2 / ‘ • Sampling density estimate: Q ii = q j K ij • Density normalization:  K = Q ≠ – KQ ≠ – ¶ α = 0 ∆ full influence of sampling statistics ¶ α = 1 2 ∆ stochastic di ff erential equations ¶ α = 1 ∆ geometry only, no sampling bias (used in this talk) • One more normalization: D ii = q j  K ij • Random walk: P = P ‘ = D ≠ 1  K

  6. DIFFUSION MAPS Coifman, Lafon 2006 • Define the di ff usion distance as: n jl ) 2 1 D t ( x i , x j ) 2 = ÿ ( P t il − P t π l l =1 • Theorem [CL06]: For α = 1 (assumed from here forward), P t/ ‘ = e t ∆ lim (the heat kernel) n →∞ ‘ ‘ → 0 Heat equation: ∂ t u = ∆ u

  7. DIFFUSION MAPS Coifman, Lafon 2006 • Define the di ff usion distance as: n jl ) 2 1 D t ( x i , x j ) 2 = ÿ ( P t il − P t π l l =1 • Theorem [CL06]: For α = 1 (assumed from here forward), P t/ ‘ = e t ∆ lim (the heat kernel) n →∞ ‘ ‘ → 0 Heat equation: ∂ t u = ∆ u

  8. B´ erard, Besson, Gallot 1994 DIFFUSION MAPS Coifman, Lafon 2006 • Let 1 = λ 0 > λ 1 Ø · · · Ø λ n − 1 Ø 0 be the eigenvalues of P , with eigenvectors 1 = ψ 0 , ψ 1 , . . . , ψ n − 1 . • Define the di ff usion map : Ψ t ( x i ) = ( λ t 1 ψ 1 ( x i ) , . . . , λ t n − 1 ψ n − 1 ( x i )) Truncated to give low • Theorem [BBG94, CL06]: The di ff usion distance satisfies: dimensional embedding D t ( x i , x j ) = Î Ψ t ( x i ) ≠ Ψ t ( x j ) Î • Theorem [BBG94]: If we compute Ψ t using the heat kernel, the pulled back t is asymptotic to the metric g of M when t æ 0+ metric Ψ ∗

  9. OUTLINE • Non-manifold data: Metric trees, biology and PHATE • Time varying data: Time coupled di ff usion maps and condensation • Future directions and conclusions

  10. Myeloid Cells Endothelial-Myeloid Progenitors Stem Cell Endothelium Blastocysts Muscle Precursors Vascular muscle cells Non-manifold trajectory data

  11. Moon, van Dijk, Wang, Burkhardt, Chen, METRIC TREE EMBEDDINGS van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Di ff usion maps - what is happening here?

  12. Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) ψ j ( x ) ψ i ( x ) Di ff usion maps embedding x 7! ( ψ i ( x ) , ψ j ( x )) for i, j = 1 , . . . , 10

  13. Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 1 ( x )

  14. Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 2 ( x )

  15. Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 3 ( x )

  16. Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 4 ( x )

  17. Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 5 ( x )

  18. Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 6 ( x )

  19. Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 7 ( x )

  20. Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 8 ( x )

  21. Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 9 ( x )

  22. Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 10 ( x )

  23. Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Conjecture: To embed the tree with di ff usion maps as x 7! ( ψ i 1 ( x ) , . . . , ψ i k ( x )) , need k ⇠ depth of the tree Punchline: The information is there, but we need a di ff erent way to get at it Eigenvector ψ 10 ( x )

  24. Moon, van Dijk, Wang, Burkhardt, Chen, GEOMETRY AND TIME SCALES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 • Theorem [Varadhan 1967]: Small time di ff usions preserve geometry. Let K ( t, x, x 0 ) be the heat kernel on M . Then: t ! 0+ t log K ( t, x, x 0 ) = − 1 4 r ( x, x 0 ) 2 lim • Numerically, though, this is perilous 1 (4 π t ) d/ 2 exp( − | x − x 0 | 2 / 4 t ) , • However, on M = R d , we have K ( t, x, x 0 ) = and so in this case we have for all t > 0 : t log K ( t, x, x 0 ) = − d 2 t log(4 π t ) − 1 4 | x − x 0 | 2 • Metric trees lie somewhere in between these two regimes, so we propagate P t for an intermediate value of t and compute: U ( t ) = U ( t ) ij = t log P t ij • We then apply multidimensional scaling (MDS) to the rows of U ( t ) to get the PHATE embedding • Open problem to make the above reasoning rigorous

  25. Moon, van Dijk, Wang, Burkhardt, Chen, BACK TO THE BINARY TREE van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) PHATE embedding

  26. Moon, van Dijk, Wang, Burkhardt, Chen, STEM CELL DATA van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Myeloid Cells Endothelial-Myeloid Progenitors Stem Cell Endothelium Blastocysts Muscle Precursors Vascular muscle cells PHATE 2 Genes Cells PHATE 1

  27. Moon, van Dijk, Wang, Burkhardt, Chen, STEM CELL DATA van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Cardiac Prog. Mix of Meso. ESC and NCC Mesoderm Neuroectoderm NCC Prog. Neural Prog. PHATE PCA tSNE Di ff usion maps

  28. MOUSE BEHAVIORAL DATA

  29. OPEN QUESTIONS • Can we develop rigorous mathematical theory relating di ff usion geometry to metric trees? • If so, can these ideas in turn shed light on the more general problems of intersecting hyperplanes or intersecting manifolds? • All of these directions are potentially relevant for the topics in the remainder of this talk

  30. Time varying manifold data

  31. TIME VARYING DATA MODEL Marshall, H. 2017 What about time varying data but with minimal assumptions on the data generation process? • Manifold model: Compact Riemannian manifold ( M , g ( t )) with smoothly varying metric g ( t ) • New heat equation: ∂ t u = ∆ g ( t ) u (couples heat di ff usion with changing geometry) • Theorem [Guenther 2002]: There exists a fundamental solution (heat ker- nel) Z ( x, t ; x 0 , s ) for the above heat equation. t = 1 t = 20 t = 40 t = 60 t = 80 t = 100 t = 120 t = 140 t = 160 t = 180

Recommend


More recommend