optimal estimation for quantile regression with
play

Optimal Estimation for Quantile Regression with Functional Response - PowerPoint PPT Presentation

Optimal Estimation for Quantile Regression with Functional Response Xiao Wang, Purdue University Mathematical and Statistical Challenges in Neuroimaging Data Analysis X. Wang (Purdue) Quantile Regression with Functional Response BIRS 1 / 25


  1. Optimal Estimation for Quantile Regression with Functional Response Xiao Wang, Purdue University Mathematical and Statistical Challenges in Neuroimaging Data Analysis X. Wang (Purdue) Quantile Regression with Functional Response BIRS 1 / 25

  2. Acknowledgment Collaborators SAMSI CCNS Zhengwu Zhang, SAMSI Linglong Kong, University of Alberta Hongtu Zhu, UNC Chapel Hill X. Wang (Purdue) Quantile Regression with Functional Response BIRS 2 / 25

  3. Motivation Functional Regression with Functional Response Functional Regression (Morris 2015) Functional Response (Hongtu Zhu ...): Y i ( s ) = X T i β ( s ) + η i ( s ) , i = 1 , . . . , n. Recover the conditional mean of Y ( s ) given X and the location s . Various imaging segmentation and registration methods end up with preprocessing results non-consistent or with errors. The error distributions are unknown, assuming Gaussian for convenience in many applications though. The variances of errors are varying spatially within the brain. Quantile regression (QR) is able to give a full picture of the data. These features make QR more appealing than its cousin, the ordinary least squares. In this paper, we would like to recover the 100 τ % quantile of the conditional distribution of Y ( s ) given X and the location s . X. Wang (Purdue) Quantile Regression with Functional Response BIRS 3 / 25

  4. Motivation Quantile Regression Quantile Regression (Koenker and Basset 1978) vs. Mean Regression y i = f ( x i ) + ǫ i , i = 1 , . . . , n. Quadratic function vs. Check function: � τr if r > 0 ρ τ ( r ) = otherwise − (1 − τ ) r Quantile regression provides better estimators than mean regression WHEN Data are skewed Data contain outliers Quantile regression does not require specifying any error distribution. Many nonparametric and semiparametric quantile regression models ... (Koenker 2005; ...) X. Wang (Purdue) Quantile Regression with Functional Response BIRS 4 / 25

  5. Motivation ADNI DTI Data Dataset : 203 subjects from ADNI Response: mean Fractional Anisotropy (FA) values along midsagittal corpus callosum skeleton (TBSS pipeline). Covariates: Gender, Age, Alzheimer’s Disease Assessment Scale, Mini-Mental State Examination. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.2 0.4 0.6 0.8 1 Figure : FA curves along corpus callosum skeleton. X. Wang (Purdue) Quantile Regression with Functional Response BIRS 5 / 25

  6. Motivation ADNI Hippocampus Image Data Dataset: 403 subjects from ADNI Response: Hippocampus images Covariates: Gender, Age, and Behavior score 8 10 10 7 5 5 5 8 6 8 5 6 10 10 10 6 4 4 4 3 15 15 15 2 2 2 1 20 20 20 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 Figure : Observed left hippocampus images. X. Wang (Purdue) Quantile Regression with Functional Response BIRS 6 / 25

  7. Quantile Regression with Functional Response Quantile Regression with Functional Response For a given τ ∈ (0 , 1) , consider a quantile regression model with varying-coefficients and functional responses, Y ( s ) = X T β τ ( s ) + η τ ( s ) η τ ( · ) is a stochastic process whose τ th quantile is zero for a fixed s given X . The conditional quantile function of Y ( s ) given X for any τ ∈ (0 , 1) can be expressed by Q Y ( s ) ( τ | X ) = X T β τ ( s ) The unknown parameters β τ = ( β 1 , . . . , β p ) , where β k ∈ H ( K ) , a RKHS generated by a pd kernel K . K ( s, t ) = (1 + � s, t � ) d , K ( s, t ) = exp( −� s − t � 2 / 2 σ 2 ) Suppose that we observe ( X i , Y i ( s ij )) for subjects i = 1 , . . . , n and locations s i 1 , . . . , s im i . Our goal is to investigate the estimation of the coefficient functions β τk , k = 1 , . . . , p . X. Wang (Purdue) Quantile Regression with Functional Response BIRS 7 / 25

  8. Quantile Regression with Functional Response Loss Function Fixed design: the functional response are observed at the same locations across curves, that is, m 1 = m 2 = · · · = m n := m and s 1 j = s 2 j = · · · = s jn := s j for j = 1 , . . . , m . Random design: the s ij are independently sampled from a distribution π ( s ) . L 2 -distance: For two function vectors f 1 , f 2 ∈ F p , define m p  1 � � ( f 1 k ( s j ) − f 2 k ( s j )) 2  fixed design   m  2 � �  j =1 k =1 � f 1 − f 2 s, 2 = � � p � � �  ( f 1 k ( s ) − f 2 k ( s )) 2 π ( s ) ds random design     S k =1 We measure the accuracy of the estimation of ˆ β τ by E nτ (ˆ � ˆ � 2 � � β τ , β τ ) = β τ − β τ s, 2 . X. Wang (Purdue) Quantile Regression with Functional Response BIRS 8 / 25

  9. Theoretical Results Rate of Convergence: Lower Bound Fix τ ∈ (0 , 1) . Suppose the eigenvalues { ρ k : k ≥ 1 } of the reproducing kernel K satisfies ρ k ≍ k − 2 r for some constant 0 < r < ∞ . Then a. For the fixed design, � β τ , β τ ) ≥ a τ ( n − 1 + m − 2 r ) � E nτ (˜ (1) lim n,m →∞ inf lim β τ ∈F p P sup = 1; a τ → 0 ˜ β τ b. For the random design, 2 r � � E nτ (˜ β τ , β τ ) ≥ a τ (( nm ) − 2 r +1 + n − 1 ) lim n,m →∞ inf lim sup = 1 . (2) β τ ∈F p P a τ → 0 ˜ β τ The above infimums are taken over all possible estimators ˜ β τ based on the training data. If τ belongs to a compact interval of (0 , 1) , a τ may not depend on τ . X. Wang (Purdue) Quantile Regression with Functional Response BIRS 9 / 25

  10. Theoretical Results Rate of Convergence: Fixed Design Under the common design, the minimax rate is of the order m − 2 r + n − 1 . This rate is fundamentally different from the usual nonparametric rate of ( nm ) 2 r/ (2 r +1) (Stone 1982). The rate is jointly determined by the sampling frequency m and the number of curves n rather than the total number of observations mn . When the functionals are sparsely sampled, that is, m = O ( n 1 / 2 r ) , the optimal rate is of the order m − 2 r , solely determined by the sampling frequency. On the other hand, when the sampling frequency is high, that is, m ≫ n 1 / 2 r , the optimal rate remains 1 /n regardless of m . X. Wang (Purdue) Quantile Regression with Functional Response BIRS 10 / 25

  11. Theoretical Results Rate of Convergence: Random Design Similar to the common design, there is a phase transition phenomenon in the optimal rate of convergence with a boundary at m = n 1 / 2 r . When the sampling frequency m is small, that is, m = O ( n 1 / 2 r ) , the optimal rate is of the order ( nm ) 2 r/ (2 r +1) which depends jointly on the values of both m and n . In the case of high sampling frequency with m ≫ n 1 / 2 r , the optimal rate is always 1 /n and does not depend on m . X. Wang (Purdue) Quantile Regression with Functional Response BIRS 11 / 25

  12. Theoretical Results Rate of Convergence When m is above the boundary, that is, m ≫ n 1 / 2 r , there is no difference between the fixed and random designs. When m is below the boundary, that is, m ≪ n 1 / 2 r , the random design is always superior to the fixed design in that it offers a faster rate of convergence. X. Wang (Purdue) Quantile Regression with Functional Response BIRS 12 / 25

  13. Computation of the Estimator Objective Function Penalized estimator: Minimize n m p 1 � � � � � Y i ( s ij ) − X T � β k � 2 ρ τ i β ( s ij ) + λ K mn i =1 j =1 k =1 Representer Theorem: m ˜ m ˆ � � β k ( s ) = θ i ξ i ( s ) + β j K ( s j , s ) , k = 1 , . . . , p i =1 j =1 Matrix form: Minimize n m 1 � � + λβ T Σ β � � Y ij − b T ij θ − a T ρ τ ij β mn i =1 j =1 X. Wang (Purdue) Quantile Regression with Functional Response BIRS 13 / 25

  14. Computation of the Estimator ADMM Algorithm Write the optimization into an equivalent form: n m ρ τ ( Y ij − u ij ) + λβ T Σ β � � min i =1 j =1 u ij = b T ij θ + a T subject to ij β, i = 1 , . . . , n, j = 1 , . . . , m Augmented Lagrangian: n m n m ρ τ ( Y ij − u ij ) + λβ T Σ β + � � � � ξ ij ( u ij − b T ij θ − a T L η ( u, ξ, θ, β ) = ij β ) i =1 j =1 i =1 j =1 n m + η � � ( u ij − b T ij θ − a T ij β ) 2 2 i =1 j =1 ADMM update: � ij β k ) + η � u k +1 ρ τ ( Y ij − u ij ) + ξ k ij ( u ij − b T ij θ k − a T 2 ( u ij − b T ij θ k − a T ij β k ) 2 = argmin uij ij  � m n � ij β + η  λβ T Σ β + ( θ k +1 , β k +1 ) = argmin θ,β � � ξ k ij a T 2 ( u k +1 − b T ij θ − a T ij β ) 2  ij i =1 j =1 ξ k +1 = ξ k ij + η ( u k +1 − b T ij θ − a T ij β k +1 ) ij ij X. Wang (Purdue) Quantile Regression with Functional Response BIRS 14 / 25

  15. Computation of the Estimator ADMM Algorithm consider the proximal operator of ρ τ with parameter µ and λ such that ρ τ ( x − µ ) + 1 � 2 λ ( x − v ) 2 � prox ρ τ ,µ,λ ( v ) = arg min . (3) x The solution to (3) can be explicitly obtained, and x + = prox ρ τ ,µ,λ ( v ) = S τ,µ,λ ( v ) , where  v − λτ v > µ + λτ  S τ,µ,λ ( v ) = 0 µ − λ (1 − τ ) ≤ v ≤ µ + λτ v + λ (1 − τ ) v < µ − λ (1 − τ ) .  When τ = 1 / 2 and µ = 0 , S τ,µ,λ ( · ) is the well-known soft thresholding operator such that λ � � S 1 / 2 , 0 ,λ ( v ) = 1 − + v, 2 | v | (for v � = 0 ) which is a shrinkage operator. X. Wang (Purdue) Quantile Regression with Functional Response BIRS 15 / 25

Recommend


More recommend