Myeloid Cells Endothelial-Myeloid Progenitors PHATE 2 Genes tem Cell Endothelium Blastocysts Muscle Precursors Vascular muscle cells Cells PHATE 1 B Diffusion Maps Articifical Tree tSNE PHATE PCA TRANSFERRING DIFFUSION BASED MANIFOLD C Diffusion Maps Embryoid Bodies tSNE PHATE PCA LEARNING TO TRAJECTORIES AND TIME VARYING DATA Matthew Hirn, Michigan State University Jointly with Daniel Burkhardt (Yale), William Chen (Yale), Ronald Coifman (Yale), Natalia Ivanova (Yale), Smita Krishnaswamy (Yale), Nicholas Marshall (Yale), Kevin Moon (Yale), Antonia van den Elzen (Yale), David van Dijk (Sloan-Kettering), Zheng Wang (Yale), Guy D MARS-seq Wolf (Yale) CyTOF iPSC Facebook Hi-C Gut Microbiome Bone Marrow Prevotella Firmicutes
MANIFOLD LEARNING ! R d sampled iid from some distribution • Data X = { x 1 , . . . , x n } ⇢ ( M , g ) , • Extrinsic dimension d possibly large: dim( M ) ⌧ d • How do we obtain new coordinates X 7! Y = { y 1 , . . . , y n } ⇢ R k with k ⇠ dim( M ) that preserve the underlying local geometry? • Cam we simultaneously emphasize clusters within the data? n
MANIFOLD LEARNING ! R d sampled iid from some distribution • Data X = { x 1 , . . . , x n } ⇢ ( M , g ) , • Extrinsic dimension d possibly large: dim( M ) ⌧ d • How do we obtain new coordinates X 7! Y = { y 1 , . . . , y n } ⇢ R k with k ⇠ dim( M ) that preserve the underlying local geometry? • Cam we simultaneously emphasize clusters within the data? n
MANIFOLD LEARNING ! R d sampled iid from some distribution • Data X = { x 1 , . . . , x n } ⇢ ( M , g ) , • Extrinsic dimension d possibly large: dim( M ) ⌧ d • How do we obtain new coordinates X 7! Y = { y 1 , . . . , y n } ⇢ R k with k ⇠ dim( M ) that preserve the underlying local geometry? • Cam we simultaneously emphasize clusters within the data? n First coordinate of embedding
Coifman, Lafon 2006 DIFFUSION MAPS Nadler, Lafon, Coifman, Kevrekidis, 2006 • Local similarity kernel: K ij = k ( x i , x j ) = e ≠Î x i ≠ x j Î 2 / ‘ • Sampling density estimate: Q ii = q j K ij • Density normalization:  K = Q ≠ – KQ ≠ – ¶ α = 0 ∆ full influence of sampling statistics ¶ α = 1 2 ∆ stochastic di ff erential equations ¶ α = 1 ∆ geometry only, no sampling bias (used in this talk) • One more normalization: D ii = q j  K ij • Random walk: P = P ‘ = D ≠ 1  K
DIFFUSION MAPS Coifman, Lafon 2006 • Define the di ff usion distance as: n jl ) 2 1 D t ( x i , x j ) 2 = ÿ ( P t il − P t π l l =1 • Theorem [CL06]: For α = 1 (assumed from here forward), P t/ ‘ = e t ∆ lim (the heat kernel) n →∞ ‘ ‘ → 0 Heat equation: ∂ t u = ∆ u
DIFFUSION MAPS Coifman, Lafon 2006 • Define the di ff usion distance as: n jl ) 2 1 D t ( x i , x j ) 2 = ÿ ( P t il − P t π l l =1 • Theorem [CL06]: For α = 1 (assumed from here forward), P t/ ‘ = e t ∆ lim (the heat kernel) n →∞ ‘ ‘ → 0 Heat equation: ∂ t u = ∆ u
B´ erard, Besson, Gallot 1994 DIFFUSION MAPS Coifman, Lafon 2006 • Let 1 = λ 0 > λ 1 Ø · · · Ø λ n − 1 Ø 0 be the eigenvalues of P , with eigenvectors 1 = ψ 0 , ψ 1 , . . . , ψ n − 1 . • Define the di ff usion map : Ψ t ( x i ) = ( λ t 1 ψ 1 ( x i ) , . . . , λ t n − 1 ψ n − 1 ( x i )) Truncated to give low • Theorem [BBG94, CL06]: The di ff usion distance satisfies: dimensional embedding D t ( x i , x j ) = Î Ψ t ( x i ) ≠ Ψ t ( x j ) Î • Theorem [BBG94]: If we compute Ψ t using the heat kernel, the pulled back t is asymptotic to the metric g of M when t æ 0+ metric Ψ ∗
OUTLINE • Non-manifold data: Metric trees, biology and PHATE • Time varying data: Time coupled di ff usion maps and condensation • Future directions and conclusions
Myeloid Cells Endothelial-Myeloid Progenitors Stem Cell Endothelium Blastocysts Muscle Precursors Vascular muscle cells Non-manifold trajectory data
Moon, van Dijk, Wang, Burkhardt, Chen, METRIC TREE EMBEDDINGS van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Di ff usion maps - what is happening here?
Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) ψ j ( x ) ψ i ( x ) Di ff usion maps embedding x 7! ( ψ i ( x ) , ψ j ( x )) for i, j = 1 , . . . , 10
Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 1 ( x )
Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 2 ( x )
Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 3 ( x )
Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 4 ( x )
Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 5 ( x )
Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 6 ( x )
Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 7 ( x )
Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 8 ( x )
Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 9 ( x )
Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Eigenvector ψ 10 ( x )
Moon, van Dijk, Wang, Burkhardt, Chen, DIFFUSION MAPS AND METRIC TREES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) Conjecture: To embed the tree with di ff usion maps as x 7! ( ψ i 1 ( x ) , . . . , ψ i k ( x )) , need k ⇠ depth of the tree Punchline: The information is there, but we need a di ff erent way to get at it Eigenvector ψ 10 ( x )
Moon, van Dijk, Wang, Burkhardt, Chen, GEOMETRY AND TIME SCALES van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 • Theorem [Varadhan 1967]: Small time di ff usions preserve geometry. Let K ( t, x, x 0 ) be the heat kernel on M . Then: t ! 0+ t log K ( t, x, x 0 ) = − 1 4 r ( x, x 0 ) 2 lim • Numerically, though, this is perilous 1 (4 π t ) d/ 2 exp( − | x − x 0 | 2 / 4 t ) , • However, on M = R d , we have K ( t, x, x 0 ) = and so in this case we have for all t > 0 : t log K ( t, x, x 0 ) = − d 2 t log(4 π t ) − 1 4 | x − x 0 | 2 • Metric trees lie somewhere in between these two regimes, so we propagate P t for an intermediate value of t and compute: U ( t ) = U ( t ) ij = t log P t ij • We then apply multidimensional scaling (MDS) to the rows of U ( t ) to get the PHATE embedding • Open problem to make the above reasoning rigorous
Moon, van Dijk, Wang, Burkhardt, Chen, BACK TO THE BINARY TREE van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Metric tree (colored by edge) PHATE embedding
Moon, van Dijk, Wang, Burkhardt, Chen, STEM CELL DATA van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Myeloid Cells Endothelial-Myeloid Progenitors Stem Cell Endothelium Blastocysts Muscle Precursors Vascular muscle cells PHATE 2 Genes Cells PHATE 1
Moon, van Dijk, Wang, Burkhardt, Chen, STEM CELL DATA van den Elzen, H., Coifman, Ivanova, Wolf, Krishnaswamy 2017 Cardiac Prog. Mix of Meso. ESC and NCC Mesoderm Neuroectoderm NCC Prog. Neural Prog. PHATE PCA tSNE Di ff usion maps
MOUSE BEHAVIORAL DATA
OPEN QUESTIONS • Can we develop rigorous mathematical theory relating di ff usion geometry to metric trees? • If so, can these ideas in turn shed light on the more general problems of intersecting hyperplanes or intersecting manifolds? • All of these directions are potentially relevant for the topics in the remainder of this talk
Time varying manifold data
TIME VARYING DATA MODEL Marshall, H. 2017 What about time varying data but with minimal assumptions on the data generation process? • Manifold model: Compact Riemannian manifold ( M , g ( t )) with smoothly varying metric g ( t ) • New heat equation: ∂ t u = ∆ g ( t ) u (couples heat di ff usion with changing geometry) • Theorem [Guenther 2002]: There exists a fundamental solution (heat ker- nel) Z ( x, t ; x 0 , s ) for the above heat equation. t = 1 t = 20 t = 40 t = 60 t = 80 t = 100 t = 120 t = 140 t = 160 t = 180
Recommend
More recommend