Fast and Accurate Inference for the Smoothing Parameter in Semiparametric Models Alex Trindade Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob Paige , Missouri University of Science and Technology Funded in part by the National Security Agency Grants: H98230-09-1-0071 (Paige) & H98230-08-1-0071 (Trindade) May 2011 SPBB Inference for Splines (Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob May 2011 1 / 19 alex.trindade@ttu.edu
Outline Motivation 1 The need for a semiparametric smoothing approach... Penalized spline models Penalized splines as linear mixed models (LMMs) Main Result: Inference on Smoothing Parameter 2 Estimators as roots of quadratic estimating equations (QEEs) Saddlepoint-based bootstrap (SPBB) inference for QEEs Exact ML & REML inference in LMMs Simulations: Confidence Intervals 3 Coverages, lengths, compute times Application: The Fossil Data 4 SPBB Inference for Splines (Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob May 2011 2 / 19 alex.trindade@ttu.edu
The LIDAR Data (Ruppert, Wand, & Carroll, 2003) Model: y = µ ( x ) + error. Goal: estimate mean function µ ( x ) , i.e. smooth data. ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.4 ● log ratio ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 400 450 500 550 600 650 700 range SPBB Inference for Splines (Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob May 2011 3 / 19 alex.trindade@ttu.edu
The Fossil Data (Ruppert, Wand, & Carroll, 2003) 0.70750 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.70740 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● strontium ratio ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● dip here? ● ● ● ● ● ● ● 0.70730 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● obvious dip! 0.70720 ● ● ● 95 100 105 110 115 120 age (millions of years) SPBB Inference for Splines (Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob May 2011 4 / 19 alex.trindade@ttu.edu
Penalized Spline Model (degree p , with K knots) K µ ( x ) = β 0 + β 1 x + · · · + β p x p + u k ( x − κ k ) p ∑ + k = 1 For n obs ( x i , y i ) , write in matrix form: µ = X β + Z u ≡ B θ . Model can allow for autocorrelation, R , in residuals (e.g. time series). Estimate θ by minimizing � ( y − B θ ) ′ R − 1 ( y − B θ ) + α u ′ u � ˆ θ PS = arg min θ α is a smoothing parameter controlling balance between: LIDAR: linear spline fits with max and min smoothing (24 knots) ● 0.0 ● ● ● ● ● ● ● ● ● ● ● fidelity to data ( α = 0 ) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● smoothness of fit ( α = ∞ ) −0.4 ● ● log ratio ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.8 ● ● ● ● ● ● ● ● ● ● α = = 0 ● ● ● ● α = = ∞ ● ● 400 450 500 550 600 650 700 range SPBB Inference for Splines (Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob May 2011 5 / 19 alex.trindade@ttu.edu
Linear Mixed Model (LMM) Formulation & BLUP’s Penalized spline can be recast as LMM with one variance component (Brumback, Ruppert, & Wand, 1999) y = X β + Z u + ε � �� � ���� random effects fixed effects y = B ˜ BLUP of y in this context is ˜ θ , where � � ( y − B θ ) ′ R − 1 ( y − B θ ) + σ 2 ˜ ε u ′ u θ = arg min . σ 2 θ u Implies BLUP-optimal value for α is: α = σ 2 ε / σ 2 u SPBB Inference for Splines (Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob May 2011 6 / 19 alex.trindade@ttu.edu
Estimation of Smoothing Parameter Since α is ratio of variance components in LMM, many parametric methods available. Also have several nonparametric methods. Examples (Parametric) Maximum Likelihood (ML) REstricted Maximum Likelihood (REML) Examples (Nonparametric) Akaike’s Information Criterion (AIC) Generalized Cross-Validation (GCV) SPBB Inference for Splines (Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob May 2011 7 / 19 alex.trindade@ttu.edu
A Unified View of Smoothing Parameter Estimators (New) Above estimators can be viewed as roots of a quadratic estimating equation (QEE) in normal random variables Q ( α ) = y ′ A α y The n × n matrix A α has a (complicated, but) closed form expression in each case... Theorem (Paige & Trindade, 2010): REML QEE is unbiased. Krivobokova & Kauermann (2007): REML less sensitive to misspecification of residual correlation than AIC or GCV. SPBB Inference for Splines (Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob May 2011 8 / 19 alex.trindade@ttu.edu
Saddlepoint-Based Bootstrap (SPBB) Inference for QEEs Pioneered by Paige, Trindade, & Fernando (2009): Relate distribution of root of QEE to that of estimator. Under normality have closed form for MGF of QEE. Use to saddlepoint approximate distribution of estimator. Now invert distribution to get CI... numerically! Leads to 2nd order accurate CIs: coverage is O ( n − 1 ) . Works for: ML, REML, AIC, GCV, etc.! SPBB Inference for Splines (Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob May 2011 9 / 19 alex.trindade@ttu.edu
SPBB: An Approximate Parametric Bootstrap Intractable! (And bootstrap too expensive...) α ( ˆ α obs ) ( α L , α U ) F ˆ pivot α solves ˆ ˆ Q ( α ) = 0 F Q ( ˆ α obs ) ( 0 ) Q ( α ) monotone saddlepoint approx via MGF of Q ( α ) α ( ˆ α obs ) = F Q ( ˆ α obs ) ( 0 ) F ˆ SPBB Inference for Splines (Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob May 2011 10 / 19 alex.trindade@ttu.edu
Exact ML & REML Inference for α Exact finite sample inference for α = σ 2 ε / σ 2 u in LMMs with one variance component (Crainiceanu, Ruppert, Claeskens, & Wand, 2005): Note: asymptotic χ 2 dist is poor approx in finite samples due to substantial point mass at 0 (Crainiceanu & Ruppert, 2004). Invert (restricted) likelihood ratio test. Grid search needed to locate endpoints of CI ( α L , α U ) . Only works for ML & REML... SPBB Inference for Splines (Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob May 2011 11 / 19 alex.trindade@ttu.edu
Simulations: Mimic Extensive Study of Lee (2003) Simulate datasets of sample size n = 200 from curves ε ∼ IID N ( 0, σ 2 y = f ( x ) + ε , ε ) Vary 3 factors: noise level ( σ 2 ε ); design density (number of x ’s); spatial variation (type of curve). Each factor at 3 levels ( j = 1, 3.5, 6). Each scenario (factor-level combo) replicated 200 times. REML-Fit linear penalized spline: O-spline basis with 35 knots placed at empirical quantiles of x ∈ ( 0, 1 ) (Wand & Ormerod, 2008). SPBB Inference for Splines (Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob May 2011 12 / 19 alex.trindade@ttu.edu
Recommend
More recommend