Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, - PowerPoint PPT Presentation

Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, 2011 L. Rosasco Manifold Regularization

About this class Goal To analyze the limits of learning from examples in high dimensional spaces. To introduce the semi-supervised setting and the use of unlabeled data to learn the intrinsic geometry of a problem. To define Riemannian Manifolds, Manifold Laplacians, Graph Laplacians. To introduce a new class of algorithms based on Manifold Regularization (LapRLS, LapSVM). L. Rosasco Manifold Regularization

Unlabeled data Why using unlabeled data? labeling is often an “expensive” process semi-supervised learning is the natural setting for human learning L. Rosasco Manifold Regularization

Semi-supervised Setting u i.i.d. samples drawn on X from the marginal distribution p ( x ) { x 1 , x 2 , . . . , x u } , only n of which endowed with labels drawn from the conditional distributions p ( y | x ) { y 1 , y 2 , . . . , y n } . The extra u − n unlabeled samples give additional information about the marginal distribution p ( x ) . L. Rosasco Manifold Regularization

The importance of unlabeled data L. Rosasco Manifold Regularization

Curse of dimensionality and p ( x ) Assume X is the D -dimensional hypercube [ 0 , 1 ] D . The worst case scenario corresponds to uniform marginal distribution p ( x ) . Local Methods A prototype example of the effect of high dimentionality can be seen in nearest methods techniques. As d increases, local techniques (eg nearest neighbors) become rapidly ineffective. L. Rosasco Manifold Regularization

Curse of dimensionality and k-NN It would seem that with a reasonably large set of training data, we could always approximate the conditional expectation by k-nearest-neighbor averaging. We should be able to find a fairly large set of observations close to any x ∈ [ 0 , 1 ] D and average them. This approach and our intuition break down in high dimensions . L. Rosasco Manifold Regularization

Sparse sampling in high dimension Suppose we send out a cubical neighborhood about one vertex to capture a fraction r of the observations. Since this corresponds to a fraction r of the unit volume, the expected edge length will be 1 D . e D ( r ) = r Already in ten dimensions e 10 ( 0 . 01 ) = 0 . 63, that is to capture 1% of the data, we must cover 63% of the range of each input variable! No more ”local” neighborhoods! L. Rosasco Manifold Regularization

Distance vs volume in high dimensions L. Rosasco Manifold Regularization

Intrinsic dimensionality Raw format of natural data is often high dimensional, but in many cases it is the outcome of some process involving only few degrees of freedom . Examples: Acoustic Phonetics ⇒ vocal tract can be modelled as a sequence of few tubes. Facial Expressions ⇒ tonus of several facial muscles control facial expression. Pose Variations ⇒ several joint angles control the combined pose of the elbow-wrist-finger system. Smoothness assumption: y ’s are “smooth” relative to natural degrees of freedom, not relative to the raw format. L. Rosasco Manifold Regularization

Manifold embedding L. Rosasco Manifold Regularization

Riemannian Manifolds A d -dimensional manifold � M = U α α is a mathematical object that generalizes domains in R d . Each one of the “patches” U α which cover M is endowed with a system of coordinates α : U α → R d . If two patches U α and U β , overlap, the transition functions β ◦ α − 1 : α ( U α � U β ) → R d must be smooth (eg. infinitely differentiable). The Riemannian Manifold inherits from its local system of coordinates, most geometrical notions available on R d : metrics, angles, volumes, etc. L. Rosasco Manifold Regularization

Manifold’s charts L. Rosasco Manifold Regularization

Differentiation over manifolds Since each point x over M is equipped with a local system of coordinates in R d (its tangent space ), all differential operators defined on functions over R d , can be extended to analogous operators on functions over M . Gradient: ∇ f ( x ) = ( ∂ ∂ ∂ x 1 f ( x ) , . . . , ∂ x d f ( x )) ⇒ ∇ M f ( x ) Laplacian: △ f ( x ) = − ∂ 2 1 f ( x ) − · · · − ∂ 2 d f ( x ) ⇒ △ M f ( x ) ∂ x 2 ∂ x 2 L. Rosasco Manifold Regularization

Measuring smoothness over M Given f : M → R ∇ M f ( x ) represents amplitude and direction of variation around x M �∇ M f ( x ) � 2 dp ( x ) is a global measure of � S ( f ) = smoothness for f Stokes’ theorem (generalization of integration by parts) links gradient and Laplacian � � �∇ M f ( x ) � 2 dp ( x ) = S ( f ) = f ( x ) △ M f ( x ) dp ( x ) M M L. Rosasco Manifold Regularization

Manifold regularization Belkin, Niyogi,Sindhwani, 04 A new class of techniques which extend standard Tikhonov regularization over RKHS, introducing the additional regularizer � f � 2 � I = M f ( x ) △ M f ( x ) dp ( x ) to enforce smoothness of solutions relative to the underlying manifold n 1 � f ∗ = arg min � V ( f ( x i ) , y i ) + λ A � f � 2 K + λ I f ( x ) △ M f ( x ) dp ( x ) n f ∈H M i = 1 λ I controls the complexity of the solution in the intrinsic geometry of M . λ A controls the complexity of the solution in the ambient space. L. Rosasco Manifold Regularization

Manifold regularization (cont.) Other natural choices of � · � 2 I exist � M f △ s Iterated Laplacians M f and their linear combinations. These smoothness penalties are related to Sobolev spaces � � � ω � 2 s | ˆ f ( x ) △ s f ( ω ) | 2 M f ( x ) dp ( x ) ≈ ω ∈ Z d Frobenius norm of the Hessian (the matrix of second derivatives of f) Hessian Eigenmaps; Donoho, Grimes 03 � M fe t △ ( f ) . The semigroup of smoothing Diffusion regularizers operators G = { e − t △ M | t > 0 } corresponds to the process of diffusion (Brownian motion) on the manifold. L. Rosasco Manifold Regularization

An empirical proxy of the manifold We cannot compute the intrinsic smoothness penalty � � f � 2 I = f ( x ) △ M f ( x ) dp ( x ) M because we don’t know the marginal distribution or the manifold M and the embedding Φ : M → R D . But we assume that the unlabeled samples are drawn i.i.d. from the uniform probability distribution over M and then mapped into R D by Φ L. Rosasco Manifold Regularization

Neighborhood graph Our proxy of the manifold is a weighted neighborhood graph G = ( V , E , W ) , with vertices V given by the points { x 1 , x 2 , . . . , x u } , edges E defined by one of the two following adjacency rules connect x i to its k nearest neighborhoods connect x i to ǫ -close points and weights W ij associated to two connected vertices � xi − xj � 2 W ij = e − ǫ Note: computational complexity O ( u 2 ) L. Rosasco Manifold Regularization

Neighborhood graph (cont.) L. Rosasco Manifold Regularization

The graph Laplacian The graph Laplacian over the weighted neighborhood graph ( G , E , W ) is the matrix � L ij = D ii − W ij , D ii = W ij . j L is the discrete counterpart of the manifold Laplacian △ M n � W ij ( f i − f j ) 2 ≈ � f T Lf = �∇ f ( x ) � 2 dp ( x ) . M i , j = 1 Analogous properties of the eigensystem : nonnegative spectrum, null space Looking for rigorous convergence results L. Rosasco Manifold Regularization

A convergence theorem Belkin, Niyogi, 05 Operator L : “out-of-sample extension” of the graph Laplacian L ( f ( x ) − f ( x i )) e − � x − xi � 2 � L ( f )( x ) = x ∈ X , f : X → R ǫ i Theorem: Let the u data points { x 1 , . . . , x u } be sampled from the uniform distribution over the embedded d -dimensional manifold M . Put ǫ = u − α , with 0 < α < 1 2 + d . Then for all f ∈ C ∞ and x ∈ X , there is a constant C, s.t. in probability, u →∞ C ǫ − d + 2 2 lim L ( f )( x ) = △ M f ( x ) . u L. Rosasco Manifold Regularization

Laplacian-based regularization algorithms (Belkin et al. 04) Replacing the unknown manifold Laplacian with the graph Laplacian � f � 2 I = 1 u 2 f T Lf , where f is the vector [ f ( x 1 ) , . . . , f ( x u )] , we get the minimization problem n 1 K + λ I f ∗ = arg min � V ( f ( x i ) , y i ) + λ A � f � 2 u 2 f T Lf n f ∈H i = 1 λ I = 0: standard regularization (RLS and SVM) λ A → 0: out-of-sample extension for Graph Regularization n = 0: unsupervised learning, Spectral Clustering L. Rosasco Manifold Regularization

The Representer Theorem Using the same type of reasoning of standard regularization networks, a Representer Theorem can be proved for the solutions of Manifold Regularization algorithms. The expansion range over all the supervised and unsupervised data points u � f ( x ) = c j K ( x , x j ) . j = 1 L. Rosasco Manifold Regularization

LapRLS Generalizes the usual RLS algorithm to the semi-supervised setting. Set V ( w , y ) = ( w − y ) 2 in the general functional. By the representer theorem, the minimization problem can be restated as follows 1 n ( y − JKc ) T ( y − JKc ) + λ A c T Kc + λ I c ∗ = arg min u 2 c T KLKc , c ∈ R u where y is the u -dimensional vector ( y 1 , . . . , y n , 0 , . . . , 0 ) , and J is the u × u matrix diag ( 1 , . . . , 1 , 0 , . . . , 0 ) . L. Rosasco Manifold Regularization

Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, - PowerPoint PPT Presentation

Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, 2011 L. Rosasco Manifold Regularization About this class Goal To analyze the limits of learning from examples in high dimensional spaces. To introduce the semi-supervised

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Manifold-driven spirals and rings Lia Athanassoula LAM, Marseille Lia Athanassoula Manifold

Charting the Right Manifold: Manifold Mixup for Few-Shot Learning Puneet Mangla 1,2* , Mayank

Manifold Construction and Parameterization for Nonlinear Manifold-Based Model Reduction Chenjie

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

A manifold structure on the set of functional observers Jochen Trumpf University of W urzburg

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

LIC-Based Regularization of Multi-Valued Images David Tschumperl CNRS UMR 6072 (GREYC/ENSICAEN)

COVID-19 Response: Service Unit Presentation for Troop Leaders Fall 2020 Re-opening the Girl

Presentation Materials for the FY2017 Results Briefing Densan System Co., Ltd. Tokyo Stock

GTC Raheel Khalid 09/15/16 1 Who am I? Raheel Khalid - @rkhalid890 A game industry and

health record of newly established local health units (TO.M.Y) in Greece Zafiropoulou Maria, Phd,

Stuttgart 23. September 2013 Workshop nichtlineare Topologieoptimierung 1 Outline

PMIx: Process Management for Exascale Environments Ralph H. Castain , David Solt, Joshua Hursey,

BELLEMONT FIRE STATION ADDITION Lionel Goy Jocelyne Rivas Marc Wasserman BACKGROUND Client:

WELCOME TO WELCOME TO CONTRACT SELECTI ON CONTRACT SELECTI ON WORKSHOP WORKSHOP MARCH 02,

Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, - PowerPoint PPT Presentation

Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, 2011 L. Rosasco Manifold Regularization About this class Goal To analyze the limits of learning from examples in high dimensional spaces. To introduce the semi-supervised

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Manifold-driven spirals and rings Lia Athanassoula LAM, Marseille Lia Athanassoula Manifold

Charting the Right Manifold: Manifold Mixup for Few-Shot Learning Puneet Mangla 1,2* , Mayank

Manifold Construction and Parameterization for Nonlinear Manifold-Based Model Reduction Chenjie

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

A manifold structure on the set of functional observers Jochen Trumpf University of W urzburg

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

LIC-Based Regularization of Multi-Valued Images David Tschumperl CNRS UMR 6072 (GREYC/ENSICAEN)

COVID-19 Response: Service Unit Presentation for Troop Leaders Fall 2020 Re-opening the Girl

Presentation Materials for the FY2017 Results Briefing Densan System Co., Ltd. Tokyo Stock

GTC Raheel Khalid 09/15/16 1 Who am I? Raheel Khalid - @rkhalid890 A game industry and

health record of newly established local health units (TO.M.Y) in Greece Zafiropoulou Maria, Phd,

Stuttgart 23. September 2013 Workshop nichtlineare Topologieoptimierung 1 Outline

PMIx: Process Management for Exascale Environments Ralph H. Castain , David Solt, Joshua Hursey,

BELLEMONT FIRE STATION ADDITION Lionel Goy Jocelyne Rivas Marc Wasserman BACKGROUND Client:

WELCOME TO WELCOME TO CONTRACT SELECTI ON CONTRACT SELECTI ON WORKSHOP WORKSHOP MARCH 02,

Regularization Overview Regularization Overview Problems & Multicollinearity We will