Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 - PowerPoint PPT Presentation

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Consistency of Nadaraya-Watson Estimator Here we consider the random design. There are n pairs of IID observations ( X 1 , Y 1 ) , . . . , ( X n , Y n ) and Y i = r ( X i ) + ǫ i , i = 1 , . . . , n, where ǫ i ’s and X i ’s are independent, and E( ǫ i ) = 0 and Var( ǫ i ) = σ 2 . Recall that for chosen smoothing parameter h n and kernel K , the Nadaraya-Watson estimator of r is given by � � � n x − X i i =1 K Y i h n � . r n ( x ) = ˆ � � n x − X i i =1 K h n 2 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Theorem (Consistency of Nadaraya-Watson Estimator) Let h n → 0 , nh = nh n → ∞ as n → ∞ . Let f denote the density of Y 2 � � X 1 and Let E < ∞ . Then for any x 0 for which r ( x 0 ) and f ( x 0 ) 1 are continuous and f ( x 0 ) > 0 , the Nadaraya-Watson estimator ˆ r n ( x 0 ) is a consistent estimator of r ( x 0 ) , that is, P r n ( x ) ˆ → r ( x 0 ) , as n → ∞ 3 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Proof of the theorem To prove this theorem, we need to use the following result. Lemma (Theorem IA in Parzen (1962)) Suppose that w ( y ) is bounded and integrable function satisfying lim y →∞ | yw ( y ) | = 0 . Let g be an integrable function. Then for h n such that h n → 0 as n → ∞ , 1 � � u − x � � lim w g ( u ) du = g ( x ) w ( u ) du, h n h n n →∞ for every continuity point x of g . 4 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Proof of the theorem In the proof, we drop the subscript of n in h n . Denote n f n ( x 0 ) = 1 � x 0 − X i � ˆ � K nh h i =1 n ψ n ( x 0 ) = 1 � x 0 − X i � ˆ � K Y i . nh h i =1 ˆ f n ( x 0 ) . Note that ˆ ψ ( x 0 ) Then ˆ r n ( x 0 ) = f n ( x 0 ) is the kernel estimator of ˆ f ( x 0 ) . It suffices to prove that ˆ → f ( x 0 ) and ˆ P P f n ( x 0 ) ψ n ( x 0 ) → r ( x 0 ) f ( x 0 ) . We will prove the latter using the lemma. The proof of the former is similar and simpler. 5 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments P Proof of the theorem: ˆ ψ n ( x 0 ) → r ( x 0 ) f ( x 0 ) First we have, � IID = 1 � � x 0 − X 1 � � � ˆ E ψ n ( x 0 ) h E K Y 1 h = 1 � � x 0 − X 1 � � h E K ( r ( X 1 ) + ǫ 1 ) h 1 � � x 0 − x � E( ǫ )=0 = K r ( x ) f ( x ) dx → r ( x 0 ) f ( x 0 ) h h Note that the kernel K satisfies the conditions on w of the lemma. The last convergence follows from the lemma and the symmetry of K . Similarly we can show that � � � ˆ r 2 ( x 0 ) + σ 2 � K 2 ( u ) du. � nh Var ψ n ( x 0 ) → f ( x 0 ) � 2 � P ˆ → 0 , which implies ˆ Hence, E ψ n ( x 0 ) − r ( x 0 ) f ( x 0 ) ψ n ( x 0 ) → r ( x 0 ) f ( x 0 ) . 6 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments MISE of the Nadaraya-Watson estimator Theorem 5.44 in Wasserman (2005) The mean integrated square error of the Nadaraya-Watson estimator is � 2 r n ) = h 4 �� r ′′ ( x ) + 2 r ′ ( x ) f ′ ( x ) n x 2 K ( x ) dx MISE (ˆ dx 4 f ( x ) + σ 2 � � 1 K 2 ( x ) dx nh − 1 + h 4 � � f ( x ) dx + o n n nh n The first term is the squared bias. The term r ′ ( x ) f ′ ( x ) f ( x ) is called the design bias as it depends on the design, that is, the distribution of X i ’s. It is known that the kernel estimator has high bias near the boundaries of the data. This is known as boundary bias. 7 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Boundary bias The blue curve is the N-W estimate and the black one is the real r ( x ) . Nadaraya−Watson (h=0.2, kernel=guassian) ● ● 1.0 ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.0 ● ● ● ● ● ● ● Y ●● ● ● −0.5 ●● ● ● ● ● ● ● ● ● ●● −1.0 ●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● −1.5 ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 x 8 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Boundary bias The blue curve is the N-W estimate and the black one is the real r ( x ) . Nadaraya−Watson (h=0.2, kernel=guassian) ● ● 1.0 ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.0 ● ● ● ● ● ● ● Y ●● ● ● −0.5 ●● ● ● ● ● ● ● ● ● ●● −1.0 ●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● −1.5 ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 x To alleviate the boundary bias, the so-called local linear regression can be used. 8 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Local linear regression Suppose that we want to estimate r ( x ) and X i is an observation close to x . By Taylor expansion, r ( X i ) ≈ r ( x ) + r ′ ( x )( X i − x ) =: a + b ( X i − x ) . Thus the problem of estimating r ( x ) is equivalent to estimating a ! Now, we replace r ( X i ) with Y i as we only observe Y i but not r ( X i ) . We want to find an a such that ( Y i − ( a + b ( X i − x ))) 2 is small. Take into a and ˆ account all the observations and let ˆ b be given by n � x − x i � ( Y i − ( a + b ( X i − x ))) 2 . a, ˆ � (ˆ b ) = argmin K h a,b i =1 The local linear estimator is defined as: ˜ r n ( x ) := ˆ a . Compare it with the Nadaraya-Watson estimator � x − x i � n � ( Y i − c ) 2 . ˆ r n ( x ) = argmin c ∈ R i =1 K h 9 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Local linear regression � x − x i Write L ( a, b ) = � n ( Y i − ( a + b ( X i − x ))) 2 . Solving the � i =1 K h � x − x i � following equation, with k i = K and z i = X i − x , h n n n ∂L ( a, b ) � � � = a k i + b k i z i − k i Y i = 0 ∂a i =1 i =1 i =1 n n n ∂L ( a, b ) � � � k i z 2 = a k i z i + b i − k i Y i z i = 0 , ∂b i =1 i =1 i =1 a = � n i =1 w i ( x ) Y i / � n yields ˆ i =1 w i ( x ) , and thus n n � � r n ( x ) = ˜ w i ( x ) Y i / w i ( x ) i =1 i =1 �� n � � n j =1 k j z 2 where w i ( x ) = k i j − z i j =1 k j z j . 10 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Local linear regression A linear smoother is defined by the following weighted average: � n i =1 l i ( x ) Y i . Clearly the local linear estimator is a linear smoother, so are the re- gressogram and the kernel estimator. Like Nadaraya-Watson estimator, ˜ r n ( x ) depends on h . We also need to choose h when using the linear estimator. The cross validation can be done in the same manner as that for N-W estimator. 11 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Local linear regression: cross validation Write the local linear estimator ˜ r h = ˜ r nh . The CV score is defined as n CV ( h ) = 1 � 2 � r ( i ) � Y i − ˜ nh ( X i ) , n i =1 r ( i ) where ˜ nh ( x i ) is the estimator without using the observation ( X i , Y i ) . Again, to compute the CV score, there is no need to fit the curve n times. We have the following relation, with l i ( X i ) = w i ( X i ) / � n j =1 w j ( X i ) , n � 2 CV ( h ) = 1 � Y i − ˜ r nh ( X i ) � . n 1 − l i ( X i ) i =1 � 2 � Y i − ˜ r nh ( X i ) 1 � n Hence h cv = argmin h . i =1 n 1 − l i ( X i ) 12 / 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Comparison Nadaraya-Watson estimator V.S. Local linear estimator Theorem 5.65 in Wasserman (2005); see also Fan (1992) Let h n → 0 and nh n → ∞ , as n → ∞ . Under some smoothing conditions on f ( x ) and r ( x ) , both ˆ r n ( x ) and ˜ r n ( x ) have variance � 1 σ 2 � � K 2 ( u ) du + o . nh n f ( x ) nh The bias of ˆ r n ( x ) is � 1 2 r ′′ ( x ) + r ′ ( x ) f ′ ( x ) � �� h 2 u 2 K ( u ) du + o ( h 2 n ) n f ( x ) r n ( x ) has bias 1 2 h 2 u 2 K ( u ) du + o ( h 2 �� whereas ˜ n r ′′ ( x ) n ) . At the boundary points, the NW estimator typically bears high bias due to the large absolute value of f ′ ( x ) f ( x ) . In this sense, local linear estimation eliminates boundary bias and is free from design bias. 13 / 18

Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 - PowerPoint PPT Presentation

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 / 18 Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Consistency of

Nonparametric Regression Splines for Nonparametric Regression Splines for Regional Atmospheric

Introduction to Nonparametric Bayesian Modeling and Gaussian Process Regression Piyush Rai Dept.

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of

Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Density Estimation

CS489/698 Lecture 9: Feb 1, 2017 Multi-layer Neural Networks, Error Backpropagation [D] Chapt.

Fitting Nonlinear Models to Data SI Model The SI model we discussed before is often written dS

g ( x ) := E [ Y | X = x ] := yPr [ Y = y | X = x ] . Recall that L [ Y | X ] = a + bX is a

The multiresolution criterion and nonparametric regression Thoralf Mildenberger and Henrike

Non-Parametric Methods and Support Vector Machines Shan-Hung Wu shwu@cs.nthu.edu.tw Department

Nonparametric inference of interaction laws in particle/agent systems Fei Lu Department of

Explainable(?) Statistical ML Derek Doran Dept. of Computer Science and Engineering Wright

Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 - PowerPoint PPT Presentation

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 / 18 Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Consistency of

Nonparametric Regression Splines for Nonparametric Regression Splines for Regional Atmospheric

Introduction to Nonparametric Bayesian Modeling and Gaussian Process Regression Piyush Rai Dept.

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of

Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Density Estimation

CS489/698 Lecture 9: Feb 1, 2017 Multi-layer Neural Networks, Error Backpropagation [D] Chapt.

Fitting Nonlinear Models to Data SI Model The SI model we discussed before is often written dS

g ( x ) := E [ Y | X = x ] := yPr [ Y = y | X = x ] . Recall that L [ Y | X ] = a + bX is a

The multiresolution criterion and nonparametric regression Thoralf Mildenberger and Henrike

Non-Parametric Methods and Support Vector Machines Shan-Hung Wu shwu@cs.nthu.edu.tw Department

Nonparametric inference of interaction laws in particle/agent systems Fei Lu Department of

Explainable(?) Statistical ML Derek Doran Dept. of Computer Science and Engineering Wright

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and