Geometric perspectives for supervised dimension reduction Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, D-X. Zhou, J. Guinney Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Department of Mathematics Duke University December 11, 2009
Geometric perspectives for supervised dimension reduction Supervised dimension reduction Information and sufficiency A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher (beloved Bayesian) and goes back to at least Adcock 1878 and Edgeworth 1884.
Geometric perspectives for supervised dimension reduction Supervised dimension reduction Information and sufficiency A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher (beloved Bayesian) and goes back to at least Adcock 1878 and Edgeworth 1884. X 1 , ..., X n drawn iid form a Gaussian can be reduced to µ, σ 2 .
Geometric perspectives for supervised dimension reduction Supervised dimension reduction Regression Assume the model Y = f ( X ) + ε, E ε = 0 , I with X ∈ X ⊂ R p and Y ∈ R .
Geometric perspectives for supervised dimension reduction Supervised dimension reduction Regression Assume the model Y = f ( X ) + ε, E ε = 0 , I with X ∈ X ⊂ R p and Y ∈ R . iid Data – D = { ( x i , y i ) } n ∼ ρ ( X , Y ). i =1
Geometric perspectives for supervised dimension reduction Supervised dimension reduction Dimension reduction R p replace X with If the data lives in a p-dimensional space X ∈ I R d , p � d . Θ( X ) ∈ I
Geometric perspectives for supervised dimension reduction Supervised dimension reduction Dimension reduction R p replace X with If the data lives in a p-dimensional space X ∈ I R d , p � d . Θ( X ) ∈ I My belief: physical, biological and social systems are inherently low dimensional and variation of interest in these systems can be captured by a low-dimensional submanifold.
Geometric perspectives for supervised dimension reduction Supervised dimension reduction Supervised dimension reduction (SDR) Given response variables Y 1 , ..., Y n ∈ I R and explanatory variables or covariates X 1 , ..., X n ∈ X ⊂ R p iid ∼ No(0 , σ 2 ) . Y i = f ( X i ) + ε i , ε i
Geometric perspectives for supervised dimension reduction Supervised dimension reduction Supervised dimension reduction (SDR) Given response variables Y 1 , ..., Y n ∈ I R and explanatory variables or covariates X 1 , ..., X n ∈ X ⊂ R p iid ∼ No(0 , σ 2 ) . Y i = f ( X i ) + ε i , ε i Is there a submanifold S ≡ S Y | X such that Y ⊥ ⊥ X | P S ( X ) ?
Geometric perspectives for supervised dimension reduction Supervised dimension reduction Visualization of SDR (a) Data (b) Diffusion map 20 1 0.8 10 0.5 0.5 Dimension 2 0.6 0 z 0 0 0.4 − 10 − 0.5 − 0.5 0.2 − 20 100 50 0 0 20 − 20 0 0 0.5 1 y x Dimension 1 (c) GOP (d) GDM 20 1 0.8 10 0.5 0.5 Dimension 2 Dimension 2 0.6 0 0 0 0.4 − 10 − 0.5 − 0.5 0.2 − 20 0 − 10 0 10 20 0 0.5 1 Dimension 1 Dimension 1
Geometric perspectives for supervised dimension reduction Supervised dimension reduction Linear projections capture nonlinear manifolds In this talk P S ( X ) = B T X where B = ( b 1 , ..., b d ).
Geometric perspectives for supervised dimension reduction Supervised dimension reduction Linear projections capture nonlinear manifolds In this talk P S ( X ) = B T X where B = ( b 1 , ..., b d ). Semiparametric model Y i = f ( X i ) + ε i = g ( b T 1 X i , . . . , b T d X i ) + ε i , span B is the dimension reduction (d.r.) subspace.
Geometric perspectives for supervised dimension reduction Learning gradients SDR model Semiparametric model Y i = f ( X i ) + ε i = g ( b T 1 X i , . . . , b T d X i ) + ε i , span B is the dimension reduction (d.r.) subspace.
Geometric perspectives for supervised dimension reduction Learning gradients SDR model Semiparametric model Y i = f ( X i ) + ε i = g ( b T 1 X i , . . . , b T d X i ) + ε i , span B is the dimension reduction (d.r.) subspace. Assume marginal distribution ρ X is concentrated on a manifold R p of dimension d � p . M ⊂ I
Geometric perspectives for supervised dimension reduction Learning gradients Gradients and outer products Given a smooth function f the gradient is � T � ∂ f ( x ) ∂ x 1 , ..., ∂ f ( x ) ∇ f ( x ) = . ∂ x p
Geometric perspectives for supervised dimension reduction Learning gradients Gradients and outer products Given a smooth function f the gradient is � T � ∂ f ( x ) ∂ x 1 , ..., ∂ f ( x ) ∇ f ( x ) = . ∂ x p Define the gradient outer product matrix Γ ∂ f ( x ) ∂ f � Γ ij = ( x ) d ρ X ( x ) , ∂ x i ∂ x j X Γ = E [( ∇ f ) ⊗ ( ∇ f )] .
Geometric perspectives for supervised dimension reduction Learning gradients GOP captures the d.r. space Suppose y = f ( X ) + ε = g ( b T 1 X , ..., b T d X ) + ε.
Geometric perspectives for supervised dimension reduction Learning gradients GOP captures the d.r. space Suppose y = f ( X ) + ε = g ( b T 1 X , ..., b T d X ) + ε. Note that for B = ( b 1 , ..., b d ) λ i b i = Γ b i .
Geometric perspectives for supervised dimension reduction Learning gradients GOP captures the d.r. space Suppose y = f ( X ) + ε = g ( b T 1 X , ..., b T d X ) + ε. Note that for B = ( b 1 , ..., b d ) λ i b i = Γ b i . For i = 1 , .., d ∂ f ( x ) = v T i ( ∇ f ( x )) � = 0 ⇒ b T i Γ b i � = 0 . ∂ v i If w ⊥ b i for all i then w T Γ w = 0.
Geometric perspectives for supervised dimension reduction Learning gradients Statistical interpretation Linear case y = β T x + ε, ε iid ∼ No(0 , σ 2 ) . Ω = cov ( E [ X | Y ]), Σ X = cov ( X ), σ 2 Y = var ( Y ).
Geometric perspectives for supervised dimension reduction Learning gradients Statistical interpretation Linear case y = β T x + ε, ε iid ∼ No(0 , σ 2 ) . Ω = cov ( E [ X | Y ]), Σ X = cov ( X ), σ 2 Y = var ( Y ). � 2 � 1 − σ 2 Γ = σ 2 Σ − 1 X ΩΣ − 1 ≈ σ 2 Y Σ − 1 X ΩΣ − 1 X . σ 2 Y X Y
Geometric perspectives for supervised dimension reduction Learning gradients Statistical interpretation For smooth f ( x ) ε iid ∼ No(0 , σ 2 ) . y = f ( x ) + ε, Ω = cov ( E [ X | Y ]) not so clear.
Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1
Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ])
Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ]) Σ i = cov ( X χ i )
Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ]) Σ i = cov ( X χ i ) σ 2 = var ( Y χ i ) i
Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ]) Σ i = cov ( X χ i ) σ 2 = var ( Y χ i ) i m i = ρ X ( χ i ) .
Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ]) Σ i = cov ( X χ i ) σ 2 = var ( Y χ i ) i m i = ρ X ( χ i ) . I � m i σ 2 i Σ − 1 Ω i Σ − 1 Γ ≈ . i i i =1
Geometric perspectives for supervised dimension reduction Learning gradients Estimating the gradient Taylor expansion y i ≈ f ( x i ) ≈ f ( x j ) + �∇ f ( x j ) , x j − x i � ≈ y j + �∇ f ( x j ) , x j − x i � if x i ≈ x j .
Geometric perspectives for supervised dimension reduction Learning gradients Estimating the gradient Taylor expansion y i ≈ f ( x i ) ≈ f ( x j ) + �∇ f ( x j ) , x j − x i � ≈ y j + �∇ f ( x j ) , x j − x i � if x i ≈ x j . Let � f ≈ ∇ f the following should be small � w ij ( y i − y j − � � f ( x j ) , x j − x i � ) 2 , i , j s p +2 exp( −� x i − x j � 2 / 2 s 2 ) enforces x i ≈ x j . 1 w ij =
Geometric perspectives for supervised dimension reduction Learning gradients Estimating the gradient The gradient estimate n 1 � 2 � f ( x j )) T ( x j − x i ) � � y i − y j − ( � + λ � � f � 2 f D = arg min w ij K n 2 � f ∈H p i , j =1 where � � f � K is a smoothness penalty, reproducing kernel Hilbert space norm.
Geometric perspectives for supervised dimension reduction Learning gradients Estimating the gradient The gradient estimate n 1 � 2 � f ( x j )) T ( x j − x i ) � � y i − y j − ( � + λ � � f � 2 f D = arg min w ij K n 2 � f ∈H p i , j =1 where � � f � K is a smoothness penalty, reproducing kernel Hilbert space norm. Goto board.
Recommend
More recommend