Some improvements of the SIR method for the estimation of Mars physical properties from hyperspectral images Stéphane Girard Mistis team, INRIA Grenoble Rhône-Alpes. http ://mistis.inrialpes.fr/ ˜ girard Joint work with CQFD team, INRIA Bordeaux Sud-Ouest and Laboratoire de Planétologie de Grenoble. 1
Outline Sliced Inverse Regression (SIR) 1 Regularization of SIR 2 SIR for data streams 3 Application to real data 4 2
Outline Sliced Inverse Regression (SIR) 1 Regularization of SIR 2 SIR for data streams 3 Application to real data 4 3
Multivariate regression Let Y ∈ R and X ∈ R p . The goal is to estimate G : R p → R such that Y = G ( X ) + ξ where ξ is independent of X . Unrealistic when p is large ( curse of dimensionality ). Dimension reduction : Replace X by its projection on a subspace of lower dimension without loss of information on the distribution of Y given X . Central subspace : smallest subspace S such that, conditionally on the projection of X on S , Y and X are independent. 4
Dimension reduction Assume (for the sake of simplicity) that dim( S ) = 1 i.e. S = span ( b ) , with b ∈ R p = ⇒ Single index model : Y = g ( b t X ) + ξ where ξ is independent of X . The estimation of the p -variate function G is replaced by the estimation of the univariate function g and of the direction b . Goal of SIR [Li, 1991] : Estimate a basis of the central subspace. ( i.e. b in this particular case. ) 5
SIR Idea : Find the direction b such that b t X best explains Y . Conversely, when Y is fixed, b t X should not vary. Find the direction b minimizing the variations of b t X given Y . In practice : The support of Y is divided into h slices S j . Minimization of the within-slice variance of b t X under the constraint var ( b t X ) = 1 . Equivalent to maximizing the between-slice variance under the same constraint. 6
Illustration 7
Estimation procedure Given a sample { ( X 1 , Y 1 ) , . . . , ( X n , Y n ) } , the direction b is estimated by Γ b such that b t ˆ ˆ b t ˆ b = argmax Σ b = 1 . (1) b where ˆ Σ is the empirical covariance matrix and ˆ Γ is the between-slice covariance matrix defined by h n j X j = 1 ˆ n ( ¯ X j − ¯ X )( ¯ X j − ¯ ¯ � X ) t , � Γ = X i , n j j =1 Y i ∈ S j where n j is the number of observations in the slice S j . The optimization problem (1) has a closed-form solution : ˆ b is the eigenvector of ˆ Σ − 1 ˆ Γ associated to the largest eigenvalue. 8
Illustration Simulated data. Sample { ( X 1 , Y 1 ) , . . . , ( X n , Y n ) } of size n = 100 with X i ∈ R p and Y i ∈ R , i = 1 , . . . , n . X i ∼ N p (0 , Σ) where Σ = Q ∆ Q t with ∆ = diag ( p θ , . . . , 2 θ , 1 θ ) , θ controls the decreasing rate of the eigenvalue screeplot, Q is an orientation matrix drawn from the uniform distribution on the set of orthogonal matrices. Y i = g ( b t X i ) + ξ where g is the link function g ( t ) = sin( πt/ 2) , b is the true direction b = 5 − 1 / 2 Q (1 , 1 , 1 , 1 , 1 , 0 , . . . , 0) t , ξ ∼ N 1 (0 , 9 . 10 − 4 ) 9
Results with θ = 2 , dimension p = 10 Blue : Y i versus the projec- tions b t X i on the true direc- tion b , Red : Y i versus the projections ˆ b t X i on the estimated direc- tion ˆ b , Green : ˆ b t X i versus b t X i . 10
Results with θ = 2 , dimension p = 50 Blue : Y i versus the projec- tions b t X i on the true direc- tion b , Red : Y i versus the projections ˆ b t X i on the estimated direc- tion ˆ b , Green : ˆ b t X i versus b t X i . 11
Explanation Problem : ˆ Σ may be singular or at least ill-conditioned in several situations. Since rank (ˆ Σ) ≤ min( n − 1 , p ) , if n ≤ p then ˆ Σ is singular. Even if n and p are of the same order, ˆ Σ is ill-conditioned, and its inversion yields numerical problems in the estimation of the central subspace. The same phenomenon occurs if the coordinates of X are strongly correlated. In the previous example, the condition number of Σ was p θ . 12
Outline Sliced Inverse Regression (SIR) 1 Regularization of SIR 2 SIR for data streams 3 Application to real data 4 13
Inverse regression model Model introduced in [Cook, 2007]. X = µ + c ( Y ) V b + ε, (2) where µ and b are vectors of R p , ε ∼ N p (0 , V ) , independent of Y , c : R → R the coordinate function. Consequence : The expectation of X − µ given Y is collinear to the direction V b . 14
Maximum likelihood estimation (1/3) c ( . ) is expanded as a linear combination of h basis functions s j ( . ) , h � c j s j ( . ) = s t ( . ) c, c ( . ) = j =1 where c = ( c 1 , . . . , c h ) t is unknown and s ( . ) = ( s 1 ( . ) , . . . , s h ( . )) t . Model (2) can be rewritten as X = µ + s t ( Y ) cV b + ε, ε ∼ N p (0 , V ) , 15
Maximum likelihood estimation (2/3) Notations W : The h × h empirical covariance matrix of s ( Y ) defined by n n W = 1 s = 1 s ) t with ¯ � � ( s ( Y i ) − ¯ s )( s ( Y i ) − ¯ s ( Y i ) . n n i =1 i =1 M : the h × p matrix defined by n M = 1 s )( X i − ¯ � X ) t , ( s ( Y i ) − ¯ n i =1 16
Maximum likelihood estimation (3/3) If W and ˆ Σ are regular, then the maximum likelihood estimator of b is ˆ b the eigenvector associated to the largest eigenvalue of ˆ Σ − 1 M t W − 1 M . ⇒ The inversion ˆ = Σ is necessary. In the particular case of piecewise constant basis functions s j ( . ) = I { . ∈ S j } , j = 1 , . . . , h, it can be shown that M t W − 1 M = ˆ Γ and thus ˆ b is the eigenvector associated to the largest eigenvalue of ˆ Σ − 1 ˆ Γ . ⇒ SIR method. = 17
Regularized SIR Introduction of a Gaussian prior N (0 , Ω) on the unknown vector b . Ω describes which directions in R p are more likely to contain b . If W and Ωˆ Σ + I p are regular, then ˆ b is the eigenvector associated to the largest eigenvalue of (Ωˆ Σ + I p ) − 1 Ω M t W − 1 M . In the particular case where the basis functions are piecewise constant, ˆ b is the eigenvector associated to the largest eigenvalue of (Ωˆ Σ + I p ) − 1 Ωˆ Γ . ⇒ The inversion of ˆ Σ is replaced by the inversion of Ωˆ = Σ + I p . ⇒ For a well-chosen a priori matrix Ω , numerical problems = disappear. 18
Links with existing methods Ridge [Zhong et al, 2005] : Ω = τ − 1 I p . No privileged direction for b in R p . τ > 0 is a regularization parameter. PCA+SIR [Chiaromonte et al, 2002] : d 1 q t � Ω = q j ˆ ˆ j , ˆ δ j j =1 where d ∈ { 1 , . . . , p } is fixed, ˆ δ 1 ≥ · · · ≥ ˆ δ d are the d largest eigenvalues of ˆ Σ and ˆ q 1 , . . . , ˆ q d are the associated eigenvectors. 19
Three new methods PCA+ridge : d Ω = 1 q t � ˆ q j ˆ j . τ j =1 In the eigenspace of dimension d , all the directions are a priori equivalent. Tikhonov : Ω = τ − 1 ˆ Σ . The directions with large variance are the most likely to contain b . PCA+Tikhonov : d Ω = 1 ˆ q t � δ j ˆ q j ˆ j . τ j =1 In the eigenspace of dimension d , the directions with large variance are the most likely to contain b . 20
Recall of SIR results with θ = 2 and p = 50 Blue : Projections b t X i on the true direction b versus Y i , Projections ˆ b t X i on Red : the estimated direction ˆ b ver- sus Y i , Green : b t X i versus ˆ b t X i . 21
Regularized SIR results (PCA+Ridge) Blue : Projections b t X i on the true direction b versus Y i , Projections ˆ b t X i on Red : the estimated direction ˆ b ver- sus Y i , Green : b t X i versus ˆ b t X i . 22
Validation on simulations Proximity criterion between the true direction b and the b ( r ) on N = 100 replications : estimated ones ˆ N PC = 1 cos 2 ( b, ˆ � b ( r ) ) N r =1 0 ≤ PC ≤ 1 , b ( r ) are nearly a value close to 0 implies a low proximity : The ˆ orthogonal to b , b ( r ) are a value close to 1 implies a high proximity : The ˆ approximately collinear with b . 23
Sensitivity with respect to the “cut-off” dimension d versus PC. The condition number is fixed ( θ = 2 ) The optimal regularization parameter is used for each value of d . PCA+SIR : very sensitive to d . PCA+ridge and PCA+Tikhonov : stable as d increases. 24
Outline Sliced Inverse Regression (SIR) 1 Regularization of SIR 2 SIR for data streams 3 Application to real data 4 25
Context We consider data arriving sequentially by blocks in a stream. Each data block j = 1 , . . . , J is an i.i.d. sample ( X i , Y i ) , i = 1 , . . . , n from the regression model (2). Goal : Update the estimation of the direction b at each arrival of a new block of observations. 26
Method Compute the individual directions ˆ b j on each block j = 1 , . . . , J using regularized SIR. Compute a common direction as J ˆ cos 2 (ˆ b j , b ) cos 2 (ˆ b j , ˆ � b = argmax b J ) . || b || =1 j =1 Idea : If ˆ b j is close to ˆ b J then ˆ b should be close to ˆ b j . Explicit solution : ˆ b is the eigenvector associated to the largest eigenvalue of J ˆ b j ˆ j cos 2 (ˆ b j , ˆ b t � M J = b J ) . j =1 27
Advantages of SIRdatastream Computational complexity O ( Jnp 2 ) v.s. O ( J 2 np 2 ) for the brute-force method which would consist in applying regularized SIR on the union of the j first blocks for j = 1 , . . . , J . Data storage O ( np ) v.s. O ( Jnp ) for the brute-force method. (under the assumption n >> max( J, p ) ). Interpretation of the weights cos 2 (ˆ b j , ˆ b J ) . 28
Recommend
More recommend