Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Lecture 8: Kernel Density Estimation (2) Applied Statistics 2015 1 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Recap A kernel density estimator is given by n � x − X i � f n,h ( x ) = 1 ˆ � K . nh h i =1 The risk of the estimator is measured locally by MSE and globally by IMSE . Both have the following decomposition. Risk = ( Bias ) 2 + Variance = ah 4 + b nh + Remaining term . Minimizing the risk yields the optimal bandwidth h opt of order n − 1 / 5 . 2 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Recap The trade-off between bias and variance is a common issue in smoothing problems. The bias increases and the variance decreases with the amount of smoothing, which is determined by the bandwidth h in kernel density estimator. 3 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments The general concept of cross validation (CV) was introduced in Stone (1974). It was not first suggested for density estimation. The basic idea of CV is very intuitive. Select a part of the data to fit the model. Then apply the fitted model to the rest of the data to assess goodness of fit. For choosing bandwidth in density estimator, the procedure works as fol- lows. Fix h Obtain the estimator based on ( n − 1) observations { X 1 , . . . , X j − 1 , X j +1 , . . . , X n } . Denote by 1 � x − X i � ˆ f ( j ) � n,h ( x ) = K . ( n − 1) h h i � = j A CV score, as a measure of GoF, is computed based on { ˆ f ( j ) n,h ( X j ) , j = 1 , . . . , n } . Varying h , a function CV ( h ) will be formed and then maximized (or minimized) to obtain a CV bandwidth. 4 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Maximum Likelihood CV Let 1 � x − X i � f ( j ) ˆ � n,h ( x ) = K . ( n − 1) h h i � = j be the estimated density based on the sample values excepted X j . We f ( j ) f ( j ) apply the estimate ˆ n,h ( x ) to x = X j to obtain ˆ n,h ( X j ) . Since X j was actually observed, a good choice of h should give large value of ˆ f ( j ) n,h ( X j ) . The rationale is similar to that of MLE. Define the CV likelihood as n f ( j ) ˆ � ˆ L ( h ) = n,h ( X j ) . j =1 The maximum likelihood CV (MLCV) bandwidth is given by h ML = argmax h ˆ L ( h ) . 5 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Maximum Likelihood CV It can be proven that under some conditions of f and K , � a.s | ˆ f n,h ML ( x ) − f ( x ) | dx → 0 . Remark. There are known examples of inconsistency of ˆ f n,h ML , if f has unbounded support. 6 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Least squares CV Consider �� � MISE ( ˆ ( ˆ f n,h ( x ) − f ( x )) 2 dx f n,h ) = E �� � �� � � ˆ ˆ f 2 ( f ( x )) 2 dx. =E n,h ( x ) dx − 2E f n,h ( x ) f ( x ) dx + The last term does not depend on h . Thus we aim to find a good h �� ˆ �� ˆ � � f 2 that minimizes M ( h ) = E n,h ( x ) dx − 2E f n,h ( x ) f ( x ) dx . However M ( h ) depends on the unknown f . We shall find an unbiased estimator of M ( h ) . �� ˆ � We only need to find an unbiased estimator of E f n,h ( x ) f ( x ) dx . 7 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Least squares CV It turns out that n 1 f ( j ) � ˆ n,h ( X j ) n j =1 �� ˆ � is an unbiased estimator of E f n,h ( x ) f ( x ) dx : n n 1 = 1 � � � � � f ( j ) ˆ � f ( j ) ˆ f (1) ˆ E n,h ( X j ) E n,h ( X j ) = E n,h ( X 1 ) n n j =1 j =1 � n �� � X 1 − X i � � X 1 − X 2 �� 1 = 1 � =E K h E K ( n − 1) h h h i =2 � �� 1 = 1 � � � x − y � � x − y � � K f ( y ) f ( x ) dydx = hK f ( y ) dy f ( x ) dx h h h � �� � � � ˆ ˆ = E f n,h ( x ) f ( x ) dx = E f n,h ( x ) f ( x ) dx . 8 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Least squares CV � ˆ � n j =1 ˆ f ( j ) n,h ( x ) dx − 2 f 2 Let LSCV ( h ) = n,h ( X j ) . We have shown n that for any h > 0 , E( LSCV ( h )) = M ( h ) . LSCV ( h ) is the least squares cross validation score . The LSCV bandwidth is defined as h ls = argmin h LSCV ( h ) . For a given h , LSCV ( h ) can be computed from the sample. A computanional formula for LSCV(h): n n n � X i − X j � � X i − X j � 1 � 2 � � � � K ( y ) K − y dy − K n 2 h h n ( n − 1) h h i =1 j =1 j =1 i � = j 9 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Least squares CV The resulting bandwidth h ls and thus the density estimator ˆ f n,h ls ( x ) are asymptotically optimal. Theorem (Stone 1984) Assume the following: (a) f is uniformly bounded. (b) K is a kernel (so a density symmetric around zero) with zero the unique-mode. (c) K is compactly supported. (d) K is Holder continuous of order β ; i.e. for x 1 , x 2 ∈ R , | K ( x 1 ) − k ( x 2 ) | ≤ constant | x 1 − x 2 | β . Then, ( ˆ � f n,h ls ( x ) − f ( x )) 2 dx a.s − 1 → 0 . ( ˆ � f n,h opt 2 ( x ) − f ( x )) 2 dx Remark. This result is regarded as a landmark in the cross-validation literature. The theorem asserts optimal performance of the LSCV without pratically any condition on f . 10 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments A few comments All the methods for choosing smoothing parameter h should be used with common sense. Recommended methods: reference bandwidth and cross validation approaches. In practice, always make plots and compare different choices of h . 11 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Multivariate density estimators 1 On the basis of n i.i.d. random vectors X i = ( X i 1 , . . . , X id ) from unknown F , we wish to estimate f , the density of F . We consider d -dimensional kernel estimators, for x = ( x 1 , . . . , x d ) ∈ R d , n 1 � x − X i � ˆ � f n ( x ) = K . nh d h i =1 where the kernel K is a d -dimensional density. In practice, K is often taken to be product kernel or an ellipsoidal kernel. 1 A bold letter denotes a vector in this section. 12 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Multivariate density estimators Product kernel: K ( x ) = � d i =1 K 0 ( x i ) , with K 0 a univariate kernel. Ellipsoidal kernel Multivariate normal density function (2 π ) − d/ 2 exp − 1 2 xx ′ � � . Multivariate Epanechnikov kernel d +2 2 c d (1 − xx ′ )1 [ − 1 , 1] ( xx ′ ) , where c d is the volume of d -dimensional unit ball: c 1 =1, c 2 = π , c 3 = 4 π/ 3 . One can also choose different amounts of smoothing along different directions: h = ( h 1 , . . . , h d ) . 13 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Multivariate density estimators Assume that h i → 0 , for i = 1 , . . . , d and n � d i =1 h d i → ∞ as n → ∞ . Under some smoothing conditions of f and K , ˆ f n ( x ) is a consistent estimator of f ( x ) : P ˆ f n ( x ) → f ( x ) . The optimal bandwidth, h i opt is c i n − 1 / ( d +4) , i = 1 , . . . , d and the corresponding risk (MSE or MISE) tend to zero at the rate of n − 4 / ( d +4) . 14 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Curse of dimensionality It refers to the situation that (estimation) problem gets harder very quickly as the dimension of the data increases. This can be due to computational burden and/or statistical efficiency. We discuss here the statistical curse of dimensionality: to obtain an accurate estimator, enormous sample size is required. MSE h opt ≈ cn − 4 / ( d +4) . Set MSE h opt = δ and solve for n : � c � d/ 4 n ≈ , δ grows exponentially with dimension d . Following we illustrate this phenomenon with two examples. 15 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Curse of dimensionality 1st Example Suppose that the data are multivariate Gaussianb N ( 0 , I d ) , with I d identity matrix. Choose the optimal h and Gaussian kernel to estimate f ( 0 ) . To achieve � 2 � ˆ E f n ( 0 ) − f ( 0 ) < 0 . 1 , f 2 ( 0 ) the number of observations n required are as in the following table (Table 4.2 of Silverman (1986)). d 2 4 6 8 10 n 19 223 2790 43,700 842,000 16 / 20
Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Curse of dimensionality Why an accurate estimator requires large sample size in multivariate case? The reason is that f ( x ) is estimated using data points in a local neighbor- hood of x . But in high dimensional setting, the data are very sparse, so local neighborhoods contain very few points. 2nd Example Suppose that we have n data points uniformly dis- tributed on the interval [0 , 1] . How many data points will be in the interval [0 , 0 . 1] ? The answer is around n/ 10 . Suppose n data points uniformly distributed on the 10-dimensional unit cube [0 , 1] 10 = [0 , 1] × · · · [0 , 1] . How many data points will be in the cube [0 , 0 . 1] 10 ? About n 0 . 1 10 n = 10 , 000 , 000 , 000 17 / 20
Recommend
More recommend