some improvements of the sir method for the estimation of
play

Some improvements of the SIR method for the estimation of Mars - PowerPoint PPT Presentation

Some improvements of the SIR method for the estimation of Mars physical properties from hyperspectral images Stphane Girard Mistis team, INRIA Grenoble Rhne-Alpes. http ://mistis.inrialpes.fr/ girard Joint work with CQFD team, INRIA


  1. Some improvements of the SIR method for the estimation of Mars physical properties from hyperspectral images Stéphane Girard Mistis team, INRIA Grenoble Rhône-Alpes. http ://mistis.inrialpes.fr/ ˜ girard Joint work with CQFD team, INRIA Bordeaux Sud-Ouest and Laboratoire de Planétologie de Grenoble. 1

  2. Outline Sliced Inverse Regression (SIR) 1 Regularization of SIR 2 SIR for data streams 3 Application to real data 4 2

  3. Outline Sliced Inverse Regression (SIR) 1 Regularization of SIR 2 SIR for data streams 3 Application to real data 4 3

  4. Multivariate regression Let Y ∈ R and X ∈ R p . The goal is to estimate G : R p → R such that Y = G ( X ) + ξ where ξ is independent of X . Unrealistic when p is large ( curse of dimensionality ). Dimension reduction : Replace X by its projection on a subspace of lower dimension without loss of information on the distribution of Y given X . Central subspace : smallest subspace S such that, conditionally on the projection of X on S , Y and X are independent. 4

  5. Dimension reduction Assume (for the sake of simplicity) that dim( S ) = 1 i.e. S = span ( b ) , with b ∈ R p = ⇒ Single index model : Y = g ( b t X ) + ξ where ξ is independent of X . The estimation of the p -variate function G is replaced by the estimation of the univariate function g and of the direction b . Goal of SIR [Li, 1991] : Estimate a basis of the central subspace. ( i.e. b in this particular case. ) 5

  6. SIR Idea : Find the direction b such that b t X best explains Y . Conversely, when Y is fixed, b t X should not vary. Find the direction b minimizing the variations of b t X given Y . In practice : The support of Y is divided into h slices S j . Minimization of the within-slice variance of b t X under the constraint var ( b t X ) = 1 . Equivalent to maximizing the between-slice variance under the same constraint. 6

  7. Illustration 7

  8. Estimation procedure Given a sample { ( X 1 , Y 1 ) , . . . , ( X n , Y n ) } , the direction b is estimated by Γ b such that b t ˆ ˆ b t ˆ b = argmax Σ b = 1 . (1) b where ˆ Σ is the empirical covariance matrix and ˆ Γ is the between-slice covariance matrix defined by h n j X j = 1 ˆ n ( ¯ X j − ¯ X )( ¯ X j − ¯ ¯ � X ) t , � Γ = X i , n j j =1 Y i ∈ S j where n j is the number of observations in the slice S j . The optimization problem (1) has a closed-form solution : ˆ b is the eigenvector of ˆ Σ − 1 ˆ Γ associated to the largest eigenvalue. 8

  9. Illustration Simulated data. Sample { ( X 1 , Y 1 ) , . . . , ( X n , Y n ) } of size n = 100 with X i ∈ R p and Y i ∈ R , i = 1 , . . . , n . X i ∼ N p (0 , Σ) where Σ = Q ∆ Q t with ∆ = diag ( p θ , . . . , 2 θ , 1 θ ) , θ controls the decreasing rate of the eigenvalue screeplot, Q is an orientation matrix drawn from the uniform distribution on the set of orthogonal matrices. Y i = g ( b t X i ) + ξ where g is the link function g ( t ) = sin( πt/ 2) , b is the true direction b = 5 − 1 / 2 Q (1 , 1 , 1 , 1 , 1 , 0 , . . . , 0) t , ξ ∼ N 1 (0 , 9 . 10 − 4 ) 9

  10. Results with θ = 2 , dimension p = 10 Blue : Y i versus the projec- tions b t X i on the true direc- tion b , Red : Y i versus the projections ˆ b t X i on the estimated direc- tion ˆ b , Green : ˆ b t X i versus b t X i . 10

  11. Results with θ = 2 , dimension p = 50 Blue : Y i versus the projec- tions b t X i on the true direc- tion b , Red : Y i versus the projections ˆ b t X i on the estimated direc- tion ˆ b , Green : ˆ b t X i versus b t X i . 11

  12. Explanation Problem : ˆ Σ may be singular or at least ill-conditioned in several situations. Since rank (ˆ Σ) ≤ min( n − 1 , p ) , if n ≤ p then ˆ Σ is singular. Even if n and p are of the same order, ˆ Σ is ill-conditioned, and its inversion yields numerical problems in the estimation of the central subspace. The same phenomenon occurs if the coordinates of X are strongly correlated. In the previous example, the condition number of Σ was p θ . 12

  13. Outline Sliced Inverse Regression (SIR) 1 Regularization of SIR 2 SIR for data streams 3 Application to real data 4 13

  14. Inverse regression model Model introduced in [Cook, 2007]. X = µ + c ( Y ) V b + ε, (2) where µ and b are vectors of R p , ε ∼ N p (0 , V ) , independent of Y , c : R → R the coordinate function. Consequence : The expectation of X − µ given Y is collinear to the direction V b . 14

  15. Maximum likelihood estimation (1/3) c ( . ) is expanded as a linear combination of h basis functions s j ( . ) , h � c j s j ( . ) = s t ( . ) c, c ( . ) = j =1 where c = ( c 1 , . . . , c h ) t is unknown and s ( . ) = ( s 1 ( . ) , . . . , s h ( . )) t . Model (2) can be rewritten as X = µ + s t ( Y ) cV b + ε, ε ∼ N p (0 , V ) , 15

  16. Maximum likelihood estimation (2/3) Notations W : The h × h empirical covariance matrix of s ( Y ) defined by n n W = 1 s = 1 s ) t with ¯ � � ( s ( Y i ) − ¯ s )( s ( Y i ) − ¯ s ( Y i ) . n n i =1 i =1 M : the h × p matrix defined by n M = 1 s )( X i − ¯ � X ) t , ( s ( Y i ) − ¯ n i =1 16

  17. Maximum likelihood estimation (3/3) If W and ˆ Σ are regular, then the maximum likelihood estimator of b is ˆ b the eigenvector associated to the largest eigenvalue of ˆ Σ − 1 M t W − 1 M . ⇒ The inversion ˆ = Σ is necessary. In the particular case of piecewise constant basis functions s j ( . ) = I { . ∈ S j } , j = 1 , . . . , h, it can be shown that M t W − 1 M = ˆ Γ and thus ˆ b is the eigenvector associated to the largest eigenvalue of ˆ Σ − 1 ˆ Γ . ⇒ SIR method. = 17

  18. Regularized SIR Introduction of a Gaussian prior N (0 , Ω) on the unknown vector b . Ω describes which directions in R p are more likely to contain b . If W and Ωˆ Σ + I p are regular, then ˆ b is the eigenvector associated to the largest eigenvalue of (Ωˆ Σ + I p ) − 1 Ω M t W − 1 M . In the particular case where the basis functions are piecewise constant, ˆ b is the eigenvector associated to the largest eigenvalue of (Ωˆ Σ + I p ) − 1 Ωˆ Γ . ⇒ The inversion of ˆ Σ is replaced by the inversion of Ωˆ = Σ + I p . ⇒ For a well-chosen a priori matrix Ω , numerical problems = disappear. 18

  19. Links with existing methods Ridge [Zhong et al, 2005] : Ω = τ − 1 I p . No privileged direction for b in R p . τ > 0 is a regularization parameter. PCA+SIR [Chiaromonte et al, 2002] : d 1 q t � Ω = q j ˆ ˆ j , ˆ δ j j =1 where d ∈ { 1 , . . . , p } is fixed, ˆ δ 1 ≥ · · · ≥ ˆ δ d are the d largest eigenvalues of ˆ Σ and ˆ q 1 , . . . , ˆ q d are the associated eigenvectors. 19

  20. Three new methods PCA+ridge : d Ω = 1 q t � ˆ q j ˆ j . τ j =1 In the eigenspace of dimension d , all the directions are a priori equivalent. Tikhonov : Ω = τ − 1 ˆ Σ . The directions with large variance are the most likely to contain b . PCA+Tikhonov : d Ω = 1 ˆ q t � δ j ˆ q j ˆ j . τ j =1 In the eigenspace of dimension d , the directions with large variance are the most likely to contain b . 20

  21. Recall of SIR results with θ = 2 and p = 50 Blue : Projections b t X i on the true direction b versus Y i , Projections ˆ b t X i on Red : the estimated direction ˆ b ver- sus Y i , Green : b t X i versus ˆ b t X i . 21

  22. Regularized SIR results (PCA+Ridge) Blue : Projections b t X i on the true direction b versus Y i , Projections ˆ b t X i on Red : the estimated direction ˆ b ver- sus Y i , Green : b t X i versus ˆ b t X i . 22

  23. Validation on simulations Proximity criterion between the true direction b and the b ( r ) on N = 100 replications : estimated ones ˆ N PC = 1 cos 2 ( b, ˆ � b ( r ) ) N r =1 0 ≤ PC ≤ 1 , b ( r ) are nearly a value close to 0 implies a low proximity : The ˆ orthogonal to b , b ( r ) are a value close to 1 implies a high proximity : The ˆ approximately collinear with b . 23

  24. Sensitivity with respect to the “cut-off” dimension d versus PC. The condition number is fixed ( θ = 2 ) The optimal regularization parameter is used for each value of d . PCA+SIR : very sensitive to d . PCA+ridge and PCA+Tikhonov : stable as d increases. 24

  25. Outline Sliced Inverse Regression (SIR) 1 Regularization of SIR 2 SIR for data streams 3 Application to real data 4 25

  26. Context We consider data arriving sequentially by blocks in a stream. Each data block j = 1 , . . . , J is an i.i.d. sample ( X i , Y i ) , i = 1 , . . . , n from the regression model (2). Goal : Update the estimation of the direction b at each arrival of a new block of observations. 26

  27. Method Compute the individual directions ˆ b j on each block j = 1 , . . . , J using regularized SIR. Compute a common direction as J ˆ cos 2 (ˆ b j , b ) cos 2 (ˆ b j , ˆ � b = argmax b J ) . || b || =1 j =1 Idea : If ˆ b j is close to ˆ b J then ˆ b should be close to ˆ b j . Explicit solution : ˆ b is the eigenvector associated to the largest eigenvalue of J ˆ b j ˆ j cos 2 (ˆ b j , ˆ b t � M J = b J ) . j =1 27

  28. Advantages of SIRdatastream Computational complexity O ( Jnp 2 ) v.s. O ( J 2 np 2 ) for the brute-force method which would consist in applying regularized SIR on the union of the j first blocks for j = 1 , . . . , J . Data storage O ( np ) v.s. O ( Jnp ) for the brute-force method. (under the assumption n >> max( J, p ) ). Interpretation of the weights cos 2 (ˆ b j , ˆ b J ) . 28

Recommend


More recommend