Supervised Principal Component Regression for Functional Data with High Dimensional Predictors Xinyi(Cindy) Zhang University of Toronto xyi.zhang@mail.utoronto.ca July 10, 2018 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 1 / 32
Joint work with Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 2 / 32
Overview Motivation 1 Methodology 2 SPCR Theoretical Properties 3 Equivalence Estimation Convergence Numerical Studies 4 Simulation Real Data Application Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 3 / 32
Motivation Functional magnetic resonance imaging (fMRI) is a noninvasive technique for studying brain activity. Image courtesy of the Rebecca Saxe laboratory, MIT news, http://news.mit.edu/2011/brain-language-0301 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 4 / 32
Motivation fMRI dataset of each subject contains a time series of 3-D images. (a) (b) Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 5 / 32
Motivation Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 6 / 32
Motivation Collection of a large dimensional set of clinical/demographic variables. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 6 / 32
Motivation Collection of a large dimensional set of clinical/demographic variables. Association hasn’t been well understood. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 6 / 32
Related Methodology PCA Principal component analysis (PCA) can be applied to extract a lower-dimensional subspace that captures the most of variation in the covariates. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 7 / 32
Related Methodology Potential problems But PCA fails to capture any information when the principal subspace extracted from the covariates is orthogonal to the vectors of regression parameters. ⇓ Supervised Principal Component Regression Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 8 / 32
Methodology Some notations Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32
Methodology Some notations Covariance matrix Σ x = E ( XX T ) ; cross-covarinace matrix � T E { XY ( t ) } [ E { XY ( t ) } ] T d t , where T is a compact support. Σ xy = Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32
Methodology Some notations Covariance matrix Σ x = E ( XX T ) ; cross-covarinace matrix � T E { XY ( t ) } [ E { XY ( t ) } ] T d t , where T is a compact support. Σ xy = { ( X i , Y i ( t )) , i = 1 , . . . , n } iid ∼ { X , Y ( t ) } . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32
Methodology Some notations Covariance matrix Σ x = E ( XX T ) ; cross-covarinace matrix � T E { XY ( t ) } [ E { XY ( t ) } ] T d t , where T is a compact support. Σ xy = { ( X i , Y i ( t )) , i = 1 , . . . , n } iid ∼ { X , Y ( t ) } . Empirical estimation � Σ x = n − 1 X T X , where X = ( X 1 , . . . , X n ) T ∈ R n × p . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32
Methodology Some notations Covariance matrix Σ x = E ( XX T ) ; cross-covarinace matrix � T E { XY ( t ) } [ E { XY ( t ) } ] T d t , where T is a compact support. Σ xy = { ( X i , Y i ( t )) , i = 1 , . . . , n } iid ∼ { X , Y ( t ) } . Empirical estimation � Σ x = n − 1 X T X , where X = ( X 1 , . . . , X n ) T ∈ R n × p . Σ xy = n − 2 � Empirical estimation � T X T Y ( t ) Y ( t ) T X d t , where Y ( t ) = ( Y 1 ( t ) , . . . , Y n ( t )) T ∈ R n . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32
Methodology Start with p < n . Regressing Y ( t ) on the projection X T w 1 , the optimal regression function γ ∗ ( t ) is the minimizer of the expected integrated residual sum of squares defined as � � � { Y ( t ) − X T w 1 γ ( t ) } T { Y ( t ) − X T w 1 γ ( t ) } I RSS = E dt . T Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 10 / 32
Methodology = ⇒ 1 E ( XX T ) w 1 } − 1 w T γ ∗ ( t ) = { w T 1 E { XY ( t ) } . Plugging in γ ∗ ( t ) into I RSS yields � ( E { Y T ( t ) Y ( t ) } − [ E { XY ( t ) } ] T w 1 { w T 1 E ( XX T ) w 1 } − 1 w T I RSS ( γ ∗ ) = 1 [ E { XY ( t ) } ]) d t . T Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 11 / 32
Methodology Among all the possible directions of w 1 , the one minimizing I RSS ( γ ∗ ) satisfies Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 12 / 32
Methodology Among all the possible directions of w 1 , the one minimizing I RSS ( γ ∗ ) satisfies Proposition If w 10 is a minimizer of I RSS ( γ ∗ ) , then w 10 satisfies w T 1 Σ xy w 1 w 10 = arg max . (2.1) w T 1 Σ x w 1 w 1 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 12 / 32
Methodology Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 13 / 32
Methodology For c � = 0, c w 10 is also a maximizer of equation (2.1). Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 13 / 32
Methodology For c � = 0, c w 10 is also a maximizer of equation (2.1). Another constraint w T 1 Σ x w 1 = 1 to adjust the effect of potential different scales in the predictor space. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 13 / 32
Methodology Convex simultaneous regression problem: Σ xy = UU T + Σ ǫ = � K i = 1 λ i v i v T + Σ ǫ , i 2 � U − Σ x V � 2 1 V ∗ = argmin F V W ∗ = Equivalent optimization problem: W ∈ R p × K tr ( W T Σ xy W ) s.t. W T Σ x W = I K max A sequence of generalized Rayleigh quotient problems (NP hard): w ∗ T Σ xy w k s.t. = arg max w k w k k T Σ x w k = 1 , w k T Σ x w ∗ = 0 , where 1 ≤ w k j Define W ∗ = ( w ∗ 1 , . . . , w ∗ j < k . K ) The Rayleigh quotient problems − → a convex simultaneous regression problem which recovers the same principal space, i.e. V ∗ V ∗ T = W ∗ W ∗ T under some mild conditions. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 14 / 32
Methodology In reality, the covariance matrices Σ x and Σ xy are unknown, and the optimization problem we’re actually solving is 1 � 2 � � U − � Σ x V � 2 F , V = argmin V U T + � and � U satisfies � Σ xy = � U � Σ ǫ = � B + � Σ ǫ . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 15 / 32
Optimization Problem in High Dimensions When p is relatively large compared with n , or p > n , by adding an ℓ 1 penalty to our reformulated problem, one can easily estimate � V � 1 � � 2 � � U − � Σ x V � 2 V = argmin F + λ � V � 1 , 1 , V where � · � 1 , 1 denotes � ( � A · 1 � 1 , � A · 2 � 1 , · · · , � A · m � 1 ) � 1 , for a matrix A ∈ R n × m . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 16 / 32
Algorithm and Tuning Parameter Selection Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32
Algorithm and Tuning Parameter Selection LASSO. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32
Algorithm and Tuning Parameter Selection LASSO. Extended BIC (Chen and Chen, 2008) to select λ K for fixed K . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32
Algorithm and Tuning Parameter Selection LASSO. Extended BIC (Chen and Chen, 2008) to select λ K for fixed K . 5-fold CV to select K . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32
Theoretical Properties To make the signal and residual separable with respect to Σ xy , we need the separability condition: λ min (Σ − 1 / 2 ( UU T )Σ − 1 / 2 ) > λ max (Σ − 1 / 2 Σ ǫ Σ − 1 / 2 ) . xy xy xy xy Theorem (Equivalence) When p < n, V = span ( V ∗ ) can recover W = span ( W ∗ ) exactly, that is V = W or equivalently V ∗ V ∗ T = W ∗ W ∗ T if the separability condition holds. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 18 / 32
Theoretical Properties Theorem (Estimation Error) Under proper conditions, with probability going to 1, � V converges to V ∗ . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 19 / 32
Numerical Results Simulation I Y ( t ) = X β ( t ) + ǫ ( t ) . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32
Numerical Results Simulation I iid ∼ N p ( 0 , Σ) , where Σ jj ′ = 0 . 5 | j − j ′ | for 1 ≤ j , j ′ ≤ p . X i Y ( t ) = X β ( t ) + ǫ ( t ) . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32
Numerical Results Simulation I iid ∼ N p ( 0 , Σ) , where Σ jj ′ = 0 . 5 | j − j ′ | for 1 ≤ j , j ′ ≤ p . X i Compact support T = [ 0 , 1 ] . Y ( t ) = X β ( t ) + ǫ ( t ) . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32
Numerical Results Simulation I iid ∼ N p ( 0 , Σ) , where Σ jj ′ = 0 . 5 | j − j ′ | for 1 ≤ j , j ′ ≤ p . X i Compact support T = [ 0 , 1 ] . iid ǫ i ( t ) ∼ a gaussian process with mean 0 and covariance function K ( s , t ) = exp {− 3 ( s − t ) 2 } for 0 ≤ s , t ≤ 1. Y ( t ) = X β ( t ) + ǫ ( t ) . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32
Recommend
More recommend