lecture 11 nonparametric regression 3 confidence bands
play

Lecture 11: Nonparametric Regression (3) Confidence Bands Applied - PowerPoint PPT Presentation

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment Lecture 11: Nonparametric Regression (3) Confidence Bands Applied Statistics 2015 1 / 21 Estimation of Variance Pointwise Confidence Intervals


  1. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment Lecture 11: Nonparametric Regression (3) Confidence Bands Applied Statistics 2015 1 / 21

  2. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment An example from Lecture 9: Pick-It Lottery ● 800 ● ● ● ● 600 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Payoff ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 400 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000 Number As suggested by the Nadaraya-Watson estimate, numbers less than 100 have larger payoffs and numbers in [200 , 300] have smaller payoffs. Question: Are these patterns real or just by chance? How much random variability is there in the curve? A Confidence band can be used to answer questions of this type. 2 / 21

  3. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment It is customary to consider fix design for constructing variance esti- mation and confidence bands. r ( x ) = � n Let ˆ i =1 l i ( x ) Y i be a linear smoother. Consider ˆ r as Nadaraya- Watson estimator or local linear estimator. Let σ 2 = Var( ǫ i ) . Recall that Y i = r ( x i ) + ǫ i . We aim to derive an estimator of σ 2 . The estimator of σ 2 will be needed for building up pointwise and simultaneous confidence bands for r . 3 / 21

  4. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment An Estimator of σ 2 Observe that, σ 2 = E ǫ 2 = E( Y i − r ( x i )) 2 � � i i =1 ( Y i − r ( x i )) 2 would serve as an estimator for � n Sample mean 1 n σ 2 . However, r and hence r ( x i ) are unknown. r ( x i ) and change the normalizing factor 1 We plug in ˆ n . The estimator is given by, � n r ( x i )) 2 i =1 ( Y i − ˆ σ 2 = ˆ , n − 2 ν + ˜ ν where ν = � n ν = � n � n j =1 l 2 i =1 l i ( x i ) and ˜ i ( x j ) . Or, let L be a i =1 matrix with entry l j ( x i ) at i -th row and j -th columne. Then ν = ν = trace ( L T L ) . trace ( L ) and ˜ 4 / 21

  5. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment An Estimator of σ 2 Theorem 5.85 in Wasserman(2005) Let ˆ r n ( x ) be a linear smoother. If r is sufficiently smooth, ν = o ( n ) and σ 2 = � n r ( x i )) 2 i =1 ( Y i − ˆ , is a consistent estimator of σ 2 . ˜ ν = o ( n ) , then ˆ n − 2 ν +˜ ν It can be shown that � 1 � = σ 2 + O σ 2 � → σ 2 , � E ˆ n − 2 ν + ˜ ν as n → ∞ ; see page 86 in Wasserman(2005). 5 / 21

  6. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment Another Estimator of σ 2 Another estimator of σ 2 is due to Rice (1984). Suppose that x i ’s are ordered: x 1 ≤ x 2 ≤ · · · ≤ x n . Define n − 1 1 � σ 2 ( Y i +1 − Y i ) 2 . ˆ r = 2( n − 1) i =1 The motivation for this estimator is as follows. Assuming that r ( x ) is smooth and ( x i +1 − x i ) is sufficiently small, we have r ( x i +1 ) − r ( x i ) ≈ 0 and hence Y i +1 − Y i ≈ ǫ i +1 − ǫi. Further, E( Y i +1 − Y i ) 2 ≈ E( ǫ i +1 ) 2 + E( ǫ i ) 2 = 2 σ 2 . So, σ 2 ≈ 1 2 E( Y i +1 − Y i ) 2 . The estimator is the corresponding sample counterpart. 6 / 21

  7. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment An example The red points represent the data. The black curve is the local linear estimator. local linear estimator 2 ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Y 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● −2 0.0 0.2 0.4 0.6 0.8 1.0 x 7 / 21

  8. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment Rcode The following code produces the plots and the estimates of variance data=data.frame(x=x,Y=Y); fit=locfit(Y~lp(x,deg=1,h=0.2),data=data); plot(fit,ylim=c(-2,2),main=’local linear estimator’); points(data$x,data$Y,pch=20,col=’red’); #hat_sigma fit$dp[8]; #hat_sigma_r ord =order(x); Y.ord = Y[ord]; h_sg2= sum((diff(Y.ord))^2)/(2*(length(Y)-1)); ˆ σ = 0 . 292 ˆ σ r = 0 . 277 σ 2 = 0 . 25 . 8 / 21

  9. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment Pointwise Confidence Intervals We aim to construct a confidence interval for r ( x 0 ) based on ˆ r ( x 0 ) . A direct idea is to find c such that � � | ˆ r ( x 0 ) − r ( x 0 ) | P < c ≥ 1 − α. � Var(ˆ r ( x 0 )) If T n = ˆ r ( x 0 ) − r ( x 0 ) √ r ( x 0 )) was (asymptotically) normal, we could choose c Var (ˆ as the corresponding normal quantile. However, this is not the case. Note that we can decompose T n in the following way. T n = ˆ r ( x 0 ) − E(ˆ r ( x 0 )) + E(ˆ r ( x 0 )) − r ( x 0 ) =: T 1 n + T 2 n . � � Var(ˆ r ( x 0 )) Var(ˆ r ( x 0 )) 9 / 21

  10. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment Pointwise Confidence Intervals Now ˆ r ( x 0 ) is the weighted sum of independent random variables. By Lindburg’s central limit theorem, d T 1 n → N (0 , 1) . However, the second term T 2 n does not vanish. bias 2 T 2 2 n = Variance . Recall that we choose an optimal bandwidth to balancing the bias 2 and the variance. Hence T 2 n � 0 . This is also true for bandwidth obtained by common data-driven bandwidth selectors such as cross validation. 10 / 21

  11. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment Pointwise Confidence Intervals We ignore the bias term. We need to estimate Var(ˆ r ( x 0 )) . We have � n � n n � � l 2 i ( x 0 )Var( Y i ) = σ 2 � l 2 Var(ˆ r ( x 0 )) = Var l i ( x 0 ) Y i = i ( x 0 ) . i =1 i =1 i =1 Write � n i =1 l 2 i ( x 0 ) = � l ( x 0 ) � 2 . We estimate the variance by r ( x 0 )) = � l ( x 0 ) � 2 ˆ � Var(ˆ σ 2 . This is where we need the estimator of σ 2 . 11 / 21

  12. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment Pointwise Confidence Intervals Via usual normal-theory approach, we obtain (1 − α ) confidence interval given by ˆ r ( x 0 ) ± z α/ 2 ˆ σ � l ( x 0 ) � . This is a confidence interval for E(ˆ r ( x 0 )) , NOT for r ( x 0 ) ! Generally, the bias term is ignored and we just accept the fact that the interval is technically an interval for E(ˆ r ( x 0 )) . Alternative approaches are for instance to estimate the bias and then derive a bias-corrected CI, cf. Eubank and Speckman (1993); or Bootstrap method, cf. Hardle and Marron (1991). 12 / 21

  13. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment A simulation example The black points represent the data. The curve is the local linear esti- mator. The shadow region corresponds to pointwise 95% confidence intervals. pointwise confidence intervals ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● Y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● −2 0.0 0.2 0.4 0.6 0.8 1.0 x 13 / 21

  14. Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment A simulation example The black points represent the data. The shadow region corresponds to pointwise 95% confidence intervals. The curve is the real regression function, r ( x ) . pointwise confidence intervals ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● Y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● −2 0.0 0.2 0.4 0.6 0.8 1.0 x 14 / 21

Recommend


More recommend