Estimating Filaments and Manifolds Larry Wasserman Dept of Statistics and Machine Learning Department Carnegie Mellon University June 2012
Co-authors Geometry: Chris Genovese, Marco Perone-Pacifico and Isa Verdinelli Topology: Sivaraman Balakrishnan, Ale Rinaldo, Don Sheehy and Aarti Singh
Introduction The Geometric problem: find a manifold � M which is close to an unknown manifold M . Topological problem: find a manifold � M which has the same homology as an unknown manifold M . When the manifold is one-dimensional we call it a filament. We are not using manifolds for dimension reduction. We are interested in estimating the manifold. Genovese, Perone-Pacifico, Verdinelli, Wasserman (2010) (arXiv:1003.5536, arXiv:1007.0549, arXiv:1109.4540). Rinaldo, Sheehy, Balakrishnan, Singh, Wasserman (2011).
Motivating Example: The Cosmic Web
Example
Example
Low-Dimensional Structure in Point Cloud Data Many datasets exhibit complex, low-dimensional structure. More Examples: • Networks of blood vessels in medical imaging. • River and road systems in remote sensing. • Fault lines in seismology. • Landmark paths for moving objects in computer vision. In addition, high-dimensional datasets often have hidden structure that we would like to identify. Several distinct problems here, including: Dimension Reduction, Clustering, and Estimation.
Manifolds and Manifold Complexes Manifolds give a useful representation of low dimensional structure. A manifold is a space that looks locally like a Euclidean space of some dimension (called the dimension of the manifold). Examples: point (0-dim), filaments (1-dim), surface of the sphere or torus (2-dim), three-dimensional sphere, space-time (4-dim). To allow for intersections and other complexities, consider a union of manifolds embedded in R D with maximal dimensions d < D . We call this a d -dimensional manifold complex.
Outline 1 The Geometric Problem 1 Minimax Theory 2 Methods 2 The Topological Problem
Minimax Manifold Estimation • Y 1 , . . . , Y n are noisy measurements near a manifold M . • M is a d -manifold embedded in R D . • G is a distribution supported on M . • Four different noise models: 1 noiseless: Y i ∼ G where support(G) = M. 2 clutter: Y i ∼ ( 1 − π ) U + π G where U is uniform. 3 perpendicular: Y i = X i + ǫ i where X i ∼ G and ǫ i is perpendicular to M . (Niyogi, Smale, Weinberger 2008). 4 additive: Y i = X i + ǫ i and ǫ i ∼ Φ .
Minimax Manifold Estimation • Let Q M be the induced distribution on Y . • Let Q = { Q M : M ∈ M} . • Loss function: Hausdorff distance H ( M , � M ) where H ( A , B ) = inf { ǫ : A ⊂ B ⊕ ǫ and B ⊂ A ⊕ ǫ } where A ⊕ ǫ = � x ∈ A B ( x , ǫ ) and B ( x , ǫ ) = { y : || x − y || ≤ ǫ } . • Goal: determine: E Q H ( � inf sup M , M ) . � M Q ∈Q
Hausdorff Distance A B H ( A , B ) = max { 2 . 5 , 1 . 5 } = 2 . 5
Condition Number (or Reach) • ∆( M ) is the largest number κ such that, if d ( x , M ) ≤ κ then x has a unique projection onto M . • Intuitively, a ball of radius ≤ ∆( M ) can roll freely but a ball of radius > ∆( M ) cannot roll freely. • ∆( M ) larges means: M is smooth and not close to being self-intersecting. • M = { M : ∆( M ) ≥ κ } . • See Niyogi, Smale and Weinberger (2009) for more on condition number.
Condition Number From Gonzalez and Maddocks (1999) A large value of ∆( M ) generates a manifold that is smooth and far from looping around itself.
Condition Number in One Dimension circles have radius r κ < r κ > r κ < 2 r κ > 2 r
Normals of size < ∆ do not Cross
A Synthetic Example A 2-d Manifold in 3-d space
Recommend
More recommend