Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Consistency of Nadaraya-Watson Estimator Here we consider the random design. There are n pairs of IID obser- vations ( X 1 , Y 1 ) , . . . , ( X n , Y n ) and Y i = r ( X i ) + ǫ i , i = 1 , . . . , n, where ǫ i ’s and X i ’s are independent, and E( ǫ i ) = 0 and Var( ǫ i ) = σ 2 . Recall that for chosen smoothing parameter h n and kernel K , the Nadaraya-Watson estimator of r is given by � � � n x − X i i =1 K Y i h n � . r n ( x ) = ˆ � � n x − X i i =1 K h n 2 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Theorem (Consistency of Nadaraya-Watson Estimator) Let h n → 0 , nh = nh n → ∞ as n → ∞ . Let f denote the density of Y 2 � � X 1 and Let E < ∞ . Then for any x 0 for which r ( x 0 ) and f ( x 0 ) 1 are continuous and f ( x 0 ) > 0 , the Nadaraya-Watson estimator ˆ r n ( x 0 ) is a consistent estimator of r ( x 0 ) , that is, P r n ( x ) ˆ → r ( x 0 ) , as n → ∞ 3 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Proof of the theorem To prove this theorem, we need to use the following result. Lemma (Theorem IA in Parzen (1962)) Suppose that w ( y ) is bounded and integrable function satisfying lim y →∞ | yw ( y ) | = 0 . Let g be an integrable function. Then for h n such that h n → 0 as n → ∞ , 1 � � u − x � � lim w g ( u ) du = g ( x ) w ( u ) du, h n h n n →∞ for every continuity point x of g . 4 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Proof of the theorem In the proof, we drop the subscript of n in h n . Denote n f n ( x 0 ) = 1 � x 0 − X i � ˆ � K nh h i =1 n ψ n ( x 0 ) = 1 � x 0 − X i � ˆ � K Y i . nh h i =1 ˆ f n ( x 0 ) . Note that ˆ ψ ( x 0 ) Then ˆ r n ( x 0 ) = f n ( x 0 ) is the kernel estimator of ˆ f ( x 0 ) . It suffices to prove that ˆ → f ( x 0 ) and ˆ P P f n ( x 0 ) ψ n ( x 0 ) → r ( x 0 ) f ( x 0 ) . We will prove the latter using the lemma. The proof of the former is similar and simpler. 5 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments P Proof of the theorem: ˆ ψ n ( x 0 ) → r ( x 0 ) f ( x 0 ) First we have, � IID = 1 � � x 0 − X 1 � � � ˆ E ψ n ( x 0 ) h E K Y 1 h = 1 � � x 0 − X 1 � � h E K ( r ( X 1 ) + ǫ 1 ) h 1 � � x 0 − x � E( ǫ )=0 = K r ( x ) f ( x ) dx → r ( x 0 ) f ( x 0 ) h h Note that the kernel K satisfies the conditions on w of the lemma. The last convergence follows from the lemma and the symmetry of K . Similarly we can show that � � � ˆ r 2 ( x 0 ) + σ 2 � K 2 ( u ) du. � nh Var ψ n ( x 0 ) → f ( x 0 ) � 2 � P ˆ → 0 , which implies ˆ Hence, E ψ n ( x 0 ) − r ( x 0 ) f ( x 0 ) ψ n ( x 0 ) → r ( x 0 ) f ( x 0 ) . 6 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments MISE of the Nadaraya-Watson estimator Theorem 5.44 in Wasserman (2005) The mean integrated square error of the Nadaraya-Watson estimator is � 2 r n ) = h 4 �� � � � r ′′ ( x ) + 2 r ′ ( x ) f ′ ( x ) n x 2 K ( x ) dx MISE (ˆ dx 4 f ( x ) + σ 2 � � 1 K 2 ( x ) dx nh − 1 + h 4 � � f ( x ) dx + o n n nh n The first term is the squared bias. The term r ′ ( x ) f ′ ( x ) f ( x ) is called the design bias as it depends on the design, that is, the distribution of X i ’s. It is known that the kernel estimator has high bias near the boundaries of the data. This is known as boundary bias. 7 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Boundary bias The blue curve is the N-W estimate and the black one is the real r ( x ) . Nadaraya−Watson (h=0.2, kernel=guassian) ● ● 1.0 ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.0 ● ● ● ● ● ● ● Y ●● ● ● −0.5 ●● ● ● ● ● ● ● ● ● ●● −1.0 ●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● −1.5 ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 x 8 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Boundary bias The blue curve is the N-W estimate and the black one is the real r ( x ) . Nadaraya−Watson (h=0.2, kernel=guassian) ● ● 1.0 ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.0 ● ● ● ● ● ● ● Y ●● ● ● −0.5 ●● ● ● ● ● ● ● ● ● ●● −1.0 ●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● −1.5 ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 x To alleviate the boundary bias, the so-called local linear regression can be used. 8 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Local linear regression Suppose that we want to estimate r ( x ) and X i is an observation close to x . By Taylor expansion, r ( X i ) ≈ r ( x ) + r ′ ( x )( X i − x ) =: a + b ( X i − x ) . Thus the problem of estimating r ( x ) is equivalent to estimating a ! Now, we replace r ( X i ) with Y i as we only observe Y i but not r ( X i ) . We want to find an a such that ( Y i − ( a + b ( X i − x ))) 2 is small. Take into a and ˆ account all the observations and let ˆ b be given by n � x − x i � ( Y i − ( a + b ( X i − x ))) 2 . a, ˆ � (ˆ b ) = argmin K h a,b i =1 The local linear estimator is defined as: ˜ r n ( x ) := ˆ a . Compare it with the Nadaraya-Watson estimator � x − x i � n � ( Y i − c ) 2 . ˆ r n ( x ) = argmin c ∈ R i =1 K h 9 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Local linear regression � x − x i Write L ( a, b ) = � n ( Y i − ( a + b ( X i − x ))) 2 . Solving the � i =1 K h � x − x i � following equation, with k i = K and z i = X i − x , h n n n ∂L ( a, b ) � � � = a k i + b k i z i − k i Y i = 0 ∂a i =1 i =1 i =1 n n n ∂L ( a, b ) � � � k i z 2 = a k i z i + b i − k i Y i z i = 0 , ∂b i =1 i =1 i =1 a = � n i =1 w i ( x ) Y i / � n yields ˆ i =1 w i ( x ) , and thus n n � � r n ( x ) = ˜ w i ( x ) Y i / w i ( x ) i =1 i =1 �� n � � n j =1 k j z 2 where w i ( x ) = k i j − z i j =1 k j z j . 10 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Local linear regression A linear smoother is defined by the following weighted average: � n i =1 l i ( x ) Y i . Clearly the local linear estimator is a linear smoother, so are the re- gressogram and the kernel estimator. Like Nadaraya-Watson estimator, ˜ r n ( x ) depends on h . We also need to choose h when using the linear estimator. The cross validation can be done in the same manner as that for N-W estimator. 11 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Local linear regression: cross validation Write the local linear estimator ˜ r h = ˜ r nh . The CV score is defined as n CV ( h ) = 1 � 2 � r ( i ) � Y i − ˜ nh ( X i ) , n i =1 r ( i ) where ˜ nh ( x i ) is the estimator without using the observation ( X i , Y i ) . Again, to compute the CV score, there is no need to fit the curve n times. We have the following relation, with l i ( X i ) = w i ( X i ) / � n j =1 w j ( X i ) , n � 2 CV ( h ) = 1 � Y i − ˜ r nh ( X i ) � . n 1 − l i ( X i ) i =1 � 2 � Y i − ˜ r nh ( X i ) 1 � n Hence h cv = argmin h . i =1 n 1 − l i ( X i ) 12 / 18
Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Comparison Nadaraya-Watson estimator V.S. Local linear estimator Theorem 5.65 in Wasserman (2005); see also Fan (1992) Let h n → 0 and nh n → ∞ , as n → ∞ . Under some smoothing conditions on f ( x ) and r ( x ) , both ˆ r n ( x ) and ˜ r n ( x ) have variance � 1 σ 2 � � K 2 ( u ) du + o . nh n f ( x ) nh The bias of ˆ r n ( x ) is � 1 2 r ′′ ( x ) + r ′ ( x ) f ′ ( x ) � �� � h 2 u 2 K ( u ) du + o ( h 2 n ) n f ( x ) r n ( x ) has bias 1 2 h 2 u 2 K ( u ) du + o ( h 2 �� � whereas ˜ n r ′′ ( x ) n ) . At the boundary points, the NW estimator typically bears high bias due to the large absolute value of f ′ ( x ) f ( x ) . In this sense, local linear estimation eliminates boundary bias and is free from design bias. 13 / 18
Recommend
More recommend