Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Lecture 7: Kernel Density Estimation Applied Statistics 2015 1 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Histogram The oldest density estimator is the hitogram. Suppose that we have a dissection of the real line into bins; then the estimator is defined by f hist = 1 NO. of X i in the same bin ˆ width of bin containing x . n 0.35 0.30 0.25 0.20 Density 0.15 0.10 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 −3 −2 −1 0 1 2 3 4 x 2 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Naive Density Estimator Let X 1 , . . . , X n be a random sample from unknown distribution function F . Suppose that F has continuous density f . How to estimate f based on the sample using non-parametric methods? 3 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Naive Density Estimator Let X 1 , . . . , X n be a random sample from unknown distribution function F . Suppose that F has continuous density f . How to estimate f based on the sample using non-parametric methods? Note that f = h ( F ) = F ′ . Plugging the edf ˆ F n does not work now because ˆ F n is a discrete df and does not have a density. 3 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Naive Density Estimator Let X 1 , . . . , X n be a random sample from unknown distribution function F . Suppose that F has continuous density f . How to estimate f based on the sample using non-parametric methods? Note that f = h ( F ) = F ′ . Plugging the edf ˆ F n does not work now because ˆ F n is a discrete df and does not have a density. On the other hand, F ( x + h ) − F ( x − h ) f ( x ) = lim . 2 h h → 0 Hence, we consider, for small h , F n ( x + h ) − ˆ ˆ F n ( x − h ) ˆ f n ( x ) = . 2 h This is called naive density estimator . 3 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Naive Density Estimator Theorem If h = h n → 0 and nh n → ∞ , as n → ∞ , then, for any y , p ˆ f n ( y ) → f ( y ) , as n → ∞ . So it is a consistent estimator. ˆ f n is a probability density function. The fact that n ( ˆ F n ( x + h ) − ˆ F n ( x − h )) ∼ Bin ( n, F ( x + h ) − F ( x − h )) leads to = F ( x + h ) − F ( x − h ) � � ˆ E f n ; 2 h = ( F ( x + h ) − F ( x − h ))(1 − F ( x + h ) + F ( x − h )) � � ˆ Var f n . 4 nh 2 4 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Naive Density Estimator Plots of ˆ f n , for n = 100 and h = 0 . 1 . 0.6 0.6 0.5 0.5 0.4 0.4 density density 0.3 0.3 0.2 0.2 0.1 0.1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.0 −3 −2 −1 0 1 2 3 −2.4 −2.2 −2.0 −1.8 −1.6 X X 5 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Naive Density Estimator Plots of ˆ f n , for n = 100 and h = 0 . 1 . 0.6 0.6 0.5 0.5 0.4 0.4 density density 0.3 0.3 0.2 0.2 0.1 0.1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.0 −3 −2 −1 0 1 2 3 −2.4 −2.2 −2.0 −1.8 −1.6 X X ˆ f n is a step function, and thus discontinuous. The empirical distri- bution assigns 1 /n probability to each observation X i . Spread this probability mass over the interval [ X i − h, X i + h ] according to a uni- form distribution. The discontinuities of the uniform density at the end points results in the ragged appearance of ˆ f n . 5 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Kernel Density Estimator For naive density estimator, we have, n n � x − X i � f n ( x ) = 1 2 I ( x − h < X i ≤ x + h ) = 1 1 1 ˆ � � 21 [ − 1 , 1) . nh nh h i =1 i =1 A kernel is a continuous and symmetric (around zero) density function satisfying � � x 2 K ( x ) dx < ∞ K 2 ( x ) dx < ∞ . and In the construction of ˆ f n , replacing the uniform density, 1 2 1 [ − 1 , 1) ( · ) , by a kernel K ( · ) , we obtain a kernel density estimator of f n f n ( x ) = 1 � x − X i � ˆ � K . nh h i =1 h is called the bandwidth . It is a smoothing parameter. 6 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Kernel Density Estimator The estimator depends on the kernel K and the bandwidth h . Following are some commonly used kernels. 3 4 (1 − u 2 )1 [ − 1 , 1] ( u ) , Epanechnikov; 15 16 (1 − u 2 ) 2 1 [ − 1 , 1] ( u ) , Biweight; 1 − 1 2 u 2 � � 2 π exp , Gaussian; √ (1 − | u | )1 [ − 1 , 1] ( u ) , triangular; 1 2 1 [ − 1 , 1] ( u ) , uniform. 7 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Example 1 Data: 100 observations from exp (1) kernel: Epanechnikov comparing the effect of different bandwidths. the black curve indicates the real density. kernel density estimator real 0.05 1.0 0.15 0.5 0.8 0.6 density 0.4 0.2 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 x larger bandwidth leads to a smoother, however, more biased, estimator 8 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Example 2 Data: 100 observations from exp (1) kernel: Gaussian comparing the effect of different bandwidths. the black curve indicates the real density. kernel density estimator 1.0 real 0.05 0.15 0.5 0.8 0.6 density 0.4 0.2 0.0 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 x larger bandwidth leads to a smoother, however, more biased, estimator 9 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments To see the performance of the estimator, consider the bias and the mean square error of ˆ f n ( x ) for fixed x . Theorem Let f be three times differentiable with bounded third derivative in a | y | 3 K ( y ) dy < ∞ . If � neighborhood of x . Let the kernel K satisfy lim n →∞ h n = 0 , then, with τ 2 = y 2 K ( y ) dy , � = 1 � � n f ′′ ( x ) τ 2 + o ( h 2 ˆ 2 h 2 E f n ( x ) − f ( x ) n ) . If in addition, lim n →∞ nh n = ∞ , then � 1 � 1 � � � ˆ K 2 ( y ) dy + o Var f n ( x ) = f ( x ) . nh n nh n Thus, MSE ( ˆ f n ( x )) =E( ˆ f n ( x ) − f ( x )) 2 =1 n ( f ′′ ( x )) 2 τ 4 + f ( x ) � � 1 � 4 h 4 K 2 ( y ) dy + o h 4 n + . nh n nh n 10 / 20
Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Optimal bandwidth: bias-variance trade off Observe that as h increases, the bias becomes large while the variance decreases. In order to find the optimal value of h , we minimize the MSE. This leads to: � 1 / 5 � K 2 ( z ) dz � f ( x ) 1 h opt 1 = n 1 / 5 . ( f ′′ ( x )) 2 τ 4 It follows that the corresponding MSE and variance are both of the order n − 4 / 5 . 11 / 20
Recommend
More recommend