Density Estimation Classification Regression Density Estimation Classification Regression Outline Density Estimation 1 Nonparametric Methods Bins Kernel Estimators k-Nearest Neighbor Steven J Zeil Multivariate Data Old Dominion Univ. Classification 2 Fall 2010 Regression 3 1 2 Density Estimation Classification Regression Density Estimation Classification Regression Nonparametric Methods Density Estimation When we cannot make assumptions about the distribution of Given a training set X , can we estimate the sample the data distribution from the data itself? But want to apply methods similar to the ones we have Trick will be coming up with useful summaries that do not already learned require us to retain the entire training set after training. Assumption: Similar inputs have similar outputs Secondary assumption: Key function (e.g., pdf, discriminants) change smoothly 3 4
Density Estimation Classification Regression Density Estimation Classification Regression Bins Histogram Divide data into bins of size h p ( x ) = # { x t in same bin as x } Histogram: ˆ Nh Naive Estimator: Solves problems of origin and exact placement of bin boundaries p ( x ) = # { x − h < x t ≤ x + h } ˆ 2 Nh This can be rewritten as N � x − x t p ( x ) = 1 � � ˆ w Nh h t =1 where w is a weight function � 1 / 2 if | u | < 1 w ( u ) = 0 otherwise 5 6 Density Estimation Classification Regression Density Estimation Classification Regression Naive Estimator Kernel Estimators We can generalize the idea of the weighting function to Kernel function, e.g., Gaussian kernel 1 e − u 2 / 2 K ( u ) = √ 2 π Kernel estimator (a.k.a. Parzen windows) N � x − x t � p ( x ) = 1 � ˆ K Nh h t =1 7 8
Density Estimation Classification Regression Density Estimation Classification Regression Kernels k-Nearest Neighbor Estimator Instead of fixing the bin width h and counting the number of neighboring instances, fix the number of neighbors and compute the bin width k p ( x ) = ˆ 2 Nd k ( x ) where d k ( x ) is the distnace to the k th closest instance to x 9 10 Density Estimation Classification Regression Density Estimation Classification Regression k-Nearest Multivariate Data Kernel estimator N x t 1 � � x − � � � p ( � ˆ x ) = K Nh d h t =1 Multivariate Gaussian kernel (spherical) d u || 2 1 � −|| � � √ K ( u ) = exp 2 2 π Multivariate Gaussian kernel (ellipsoid) 1 � − 1 � u T S − 1 / 2 � K ( u ) = (2 π ) d / 2 | S | 1 / 2 exp 2 � u 11 12
Density Estimation Classification Regression Density Estimation Classification Regression Potential Problems Classification As number of dimensions rises, # of “bins” explodes Estimate p ( � x | C i ) and use Bayes’ rule Data must be similarly scaled if idea of “distance” is to remain reasonable 13 14 Density Estimation Classification Regression Density Estimation Classification Regression Classification - Kernel Estimator Classification - k-NN Estimator k i N x t 1 � � x − � � ˆ p ( � x | C i ) = � r t p ( � ˆ x | C i ) = K N i V k ( � x ) i N i h d h t =1 where V k ( � x ) is the volume of the smallest (hyper)sphere P ( C i ) = N i ˆ containing � x and its nearest k neighbors N P ( C i ) = N i ˆ N x | C i )ˆ g i ( � x ) = p ( � ˆ P ( C i ) N x t 1 � � x − � � x | C i )ˆ p ( � ˆ P ( C i ) � r t ˆ = K ( C i | � x ) = i Nh d h ˆ p ( � x ) t =1 k i = k Assign the input to the class having the most instances among the k nearest neighbors of � x . 15 16
Density Estimation Classification Regression Density Estimation Classification Regression Condensed Nearest Neighbor Nonparametric Regression (Smoothing) Extending the idea of a histogram to regression 1-NN is easy to compute t b ( x , x t ) r t � Discriminant is piecewise g ( x ) = ˆ linear � t b ( x , x t ) But requires that we keep where the entire training set � 1 Condensed NN discards if x is in the same bin as x b ( x , x t ) = “interior” points that cannot o otherwise affect the discriminant “regressogram” Finding such consistent subsets is NP Requires approximation in practice 17 18 Density Estimation Classification Regression Density Estimation Classification Regression Bin Smoothing Running Mean Smoother Define a “bin” centered on x : � � x − x t r t � t w h g ( x ) = ˆ � x − x t � � t w h where � 1 if | u | < 1 w ( u ) = o otherwise particularly popular with evenly spaced data 19 20
Density Estimation Classification Regression Density Estimation Classification Regression Running Mean Smoothing Kernel Smoother � � x − x t r t � t K h g ( x ) = ˆ � x − x t � � t K h where K is Gaussian In this, and subsequent smoothers, can also reformulate in terms of closest k neighbors 21 22 Density Estimation Classification Regression Density Estimation Classification Regression Kernel Smoothing Running Line Smoother In running mean we took an average over all points in a bin Instead we could fit a linear regression line to all points in a bin Numerical analysis has spline techniques that smooth derivatives as well as function values 23 24
Density Estimation Classification Regression Density Estimation Classification Regression Running Line Smoothing Choosing h or k Small values exaggerate effects of single instances - high variance Larger values increases bias Cross-validation 25 26
Recommend
More recommend