Supervised Principal Component Regression for Functional Data with - PowerPoint PPT Presentation

Supervised Principal Component Regression for Functional Data with High Dimensional Predictors Xinyi(Cindy) Zhang University of Toronto xyi.zhang@mail.utoronto.ca July 10, 2018 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 1 / 32

Joint work with Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 2 / 32

Overview Motivation 1 Methodology 2 SPCR Theoretical Properties 3 Equivalence Estimation Convergence Numerical Studies 4 Simulation Real Data Application Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 3 / 32

Motivation Functional magnetic resonance imaging (fMRI) is a noninvasive technique for studying brain activity. Image courtesy of the Rebecca Saxe laboratory, MIT news, http://news.mit.edu/2011/brain-language-0301 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 4 / 32

Motivation fMRI dataset of each subject contains a time series of 3-D images. (a) (b) Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 5 / 32

Motivation Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 6 / 32

Motivation Collection of a large dimensional set of clinical/demographic variables. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 6 / 32

Motivation Collection of a large dimensional set of clinical/demographic variables. Association hasn’t been well understood. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 6 / 32

Related Methodology PCA Principal component analysis (PCA) can be applied to extract a lower-dimensional subspace that captures the most of variation in the covariates. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 7 / 32

Related Methodology Potential problems But PCA fails to capture any information when the principal subspace extracted from the covariates is orthogonal to the vectors of regression parameters. ⇓ Supervised Principal Component Regression Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 8 / 32

Methodology Some notations Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32

Methodology Some notations Covariance matrix Σ x = E ( XX T ) ; cross-covarinace matrix � T E { XY ( t ) } [ E { XY ( t ) } ] T d t , where T is a compact support. Σ xy = Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32

Methodology Some notations Covariance matrix Σ x = E ( XX T ) ; cross-covarinace matrix � T E { XY ( t ) } [ E { XY ( t ) } ] T d t , where T is a compact support. Σ xy = { ( X i , Y i ( t )) , i = 1 , . . . , n } iid ∼ { X , Y ( t ) } . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32

Methodology Some notations Covariance matrix Σ x = E ( XX T ) ; cross-covarinace matrix � T E { XY ( t ) } [ E { XY ( t ) } ] T d t , where T is a compact support. Σ xy = { ( X i , Y i ( t )) , i = 1 , . . . , n } iid ∼ { X , Y ( t ) } . Empirical estimation � Σ x = n − 1 X T X , where X = ( X 1 , . . . , X n ) T ∈ R n × p . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32

Methodology Some notations Covariance matrix Σ x = E ( XX T ) ; cross-covarinace matrix � T E { XY ( t ) } [ E { XY ( t ) } ] T d t , where T is a compact support. Σ xy = { ( X i , Y i ( t )) , i = 1 , . . . , n } iid ∼ { X , Y ( t ) } . Empirical estimation � Σ x = n − 1 X T X , where X = ( X 1 , . . . , X n ) T ∈ R n × p . Σ xy = n − 2 � Empirical estimation � T X T Y ( t ) Y ( t ) T X d t , where Y ( t ) = ( Y 1 ( t ) , . . . , Y n ( t )) T ∈ R n . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 9 / 32

Methodology Start with p < n . Regressing Y ( t ) on the projection X T w 1 , the optimal regression function γ ∗ ( t ) is the minimizer of the expected integrated residual sum of squares defined as � � � { Y ( t ) − X T w 1 γ ( t ) } T { Y ( t ) − X T w 1 γ ( t ) } I RSS = E dt . T Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 10 / 32

Methodology = ⇒ 1 E ( XX T ) w 1 } − 1 w T γ ∗ ( t ) = { w T 1 E { XY ( t ) } . Plugging in γ ∗ ( t ) into I RSS yields � ( E { Y T ( t ) Y ( t ) } − [ E { XY ( t ) } ] T w 1 { w T 1 E ( XX T ) w 1 } − 1 w T I RSS ( γ ∗ ) = 1 [ E { XY ( t ) } ]) d t . T Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 11 / 32

Methodology Among all the possible directions of w 1 , the one minimizing I RSS ( γ ∗ ) satisfies Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 12 / 32

Methodology Among all the possible directions of w 1 , the one minimizing I RSS ( γ ∗ ) satisfies Proposition If w 10 is a minimizer of I RSS ( γ ∗ ) , then w 10 satisfies w T 1 Σ xy w 1 w 10 = arg max . (2.1) w T 1 Σ x w 1 w 1 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 12 / 32

Methodology Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 13 / 32

Methodology For c � = 0, c w 10 is also a maximizer of equation (2.1). Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 13 / 32

Methodology For c � = 0, c w 10 is also a maximizer of equation (2.1). Another constraint w T 1 Σ x w 1 = 1 to adjust the effect of potential different scales in the predictor space. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 13 / 32

Methodology Convex simultaneous regression problem: Σ xy = UU T + Σ ǫ = � K i = 1 λ i v i v T + Σ ǫ , i 2 � U − Σ x V � 2 1 V ∗ = argmin F V W ∗ = Equivalent optimization problem: W ∈ R p × K tr ( W T Σ xy W ) s.t. W T Σ x W = I K max A sequence of generalized Rayleigh quotient problems (NP hard): w ∗ T Σ xy w k s.t. = arg max w k w k k T Σ x w k = 1 , w k T Σ x w ∗ = 0 , where 1 ≤ w k j Define W ∗ = ( w ∗ 1 , . . . , w ∗ j < k . K ) The Rayleigh quotient problems − → a convex simultaneous regression problem which recovers the same principal space, i.e. V ∗ V ∗ T = W ∗ W ∗ T under some mild conditions. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 14 / 32

Methodology In reality, the covariance matrices Σ x and Σ xy are unknown, and the optimization problem we’re actually solving is 1 � 2 � � U − � Σ x V � 2 F , V = argmin V U T + � and � U satisfies � Σ xy = � U � Σ ǫ = � B + � Σ ǫ . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 15 / 32

Optimization Problem in High Dimensions When p is relatively large compared with n , or p > n , by adding an ℓ 1 penalty to our reformulated problem, one can easily estimate � V � 1 � � 2 � � U − � Σ x V � 2 V = argmin F + λ � V � 1 , 1 , V where � · � 1 , 1 denotes � ( � A · 1 � 1 , � A · 2 � 1 , · · · , � A · m � 1 ) � 1 , for a matrix A ∈ R n × m . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 16 / 32

Algorithm and Tuning Parameter Selection Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32

Algorithm and Tuning Parameter Selection LASSO. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32

Algorithm and Tuning Parameter Selection LASSO. Extended BIC (Chen and Chen, 2008) to select λ K for fixed K . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32

Algorithm and Tuning Parameter Selection LASSO. Extended BIC (Chen and Chen, 2008) to select λ K for fixed K . 5-fold CV to select K . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 17 / 32

Theoretical Properties To make the signal and residual separable with respect to Σ xy , we need the separability condition: λ min (Σ − 1 / 2 ( UU T )Σ − 1 / 2 ) > λ max (Σ − 1 / 2 Σ ǫ Σ − 1 / 2 ) . xy xy xy xy Theorem (Equivalence) When p < n, V = span ( V ∗ ) can recover W = span ( W ∗ ) exactly, that is V = W or equivalently V ∗ V ∗ T = W ∗ W ∗ T if the separability condition holds. Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 18 / 32

Theoretical Properties Theorem (Estimation Error) Under proper conditions, with probability going to 1, � V converges to V ∗ . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 19 / 32

Numerical Results Simulation I Y ( t ) = X β ( t ) + ǫ ( t ) . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32

Numerical Results Simulation I iid ∼ N p ( 0 , Σ) , where Σ jj ′ = 0 . 5 | j − j ′ | for 1 ≤ j , j ′ ≤ p . X i Y ( t ) = X β ( t ) + ǫ ( t ) . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32

Numerical Results Simulation I iid ∼ N p ( 0 , Σ) , where Σ jj ′ = 0 . 5 | j − j ′ | for 1 ≤ j , j ′ ≤ p . X i Compact support T = [ 0 , 1 ] . Y ( t ) = X β ( t ) + ǫ ( t ) . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32

Numerical Results Simulation I iid ∼ N p ( 0 , Σ) , where Σ jj ′ = 0 . 5 | j − j ′ | for 1 ≤ j , j ′ ≤ p . X i Compact support T = [ 0 , 1 ] . iid ǫ i ( t ) ∼ a gaussian process with mean 0 and covariance function K ( s , t ) = exp {− 3 ( s − t ) 2 } for 0 ≤ s , t ≤ 1. Y ( t ) = X β ( t ) + ǫ ( t ) . Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 20 / 32

Supervised Principal Component Regression for Functional Data with - PowerPoint PPT Presentation

Supervised Principal Component Regression for Functional Data with High Dimensional Predictors Xinyi(Cindy) Zhang University of Toronto xyi.zhang@mail.utoronto.ca July 10, 2018 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 1

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Functional components Notification component Application received Refuse ? Notification

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Functional Safety Functional Safety Adam Kane Principal Sponsor 13-15 November 2018 Rotorua,

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Principal Component Analysis Powerpoint Presentation What is multivariate analysis? Summarizing

Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , Csaba Szepesvri (1) ,

SURVEILLANCE SENSOR Cristina SANTANA CONSTELLATIONS Date 16/03/2016 INTRODUCTION SOMMAIRE

While Loops Announcements for This Lecture Assignments Prelim 2 Prelim, Nov 21 st at 7:30

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal methods; Incremental methods

` STANDARDS (IFRS) CONVERGENCE AND THE IMPLEMENTATION OF CORPORATE GOVERNANCE ON INTEGRITY OF

Sustainability Program Update and 2019 Plan Update Citizen Oversight Panel November 29, 2018

GE 2016 first quarter performance Financial results & Company highlights April 22, 2016

Pro-employment macroeconomic frameworks, sectoral strategies for employment creation and the

Supervised Principal Component Regression for Functional Data with - PowerPoint PPT Presentation

Supervised Principal Component Regression for Functional Data with High Dimensional Predictors Xinyi(Cindy) Zhang University of Toronto xyi.zhang@mail.utoronto.ca July 10, 2018 Xinyi(Cindy) Zhang (University of Toronto) SPCR July 10, 2018 1

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Functional components Notification component Application received Refuse ? Notification

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Functional Safety Functional Safety Adam Kane Principal Sponsor 13-15 November 2018 Rotorua,

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Principal Component Analysis Powerpoint Presentation What is multivariate analysis? Summarizing

Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , Csaba Szepesvri (1) ,

SURVEILLANCE SENSOR Cristina SANTANA CONSTELLATIONS Date 16/03/2016 INTRODUCTION SOMMAIRE

While Loops Announcements for This Lecture Assignments Prelim 2 Prelim, Nov 21 st at 7:30

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal methods; Incremental methods

` STANDARDS (IFRS) CONVERGENCE AND THE IMPLEMENTATION OF CORPORATE GOVERNANCE ON INTEGRITY OF

Sustainability Program Update and 2019 Plan Update Citizen Oversight Panel November 29, 2018

GE 2016 first quarter performance Financial results &amp; Company highlights April 22, 2016

Pro-employment macroeconomic frameworks, sectoral strategies for employment creation and the

GE 2016 first quarter performance Financial results & Company highlights April 22, 2016