nonparametric methods
play

Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 - PowerPoint PPT Presentation

Density Estimation Classification Regression Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Density Estimation Classification Regression Outline Density Estimation 1 Bins Kernel Estimators k-Nearest Neighbor


  1. Density Estimation Classification Regression Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1

  2. Density Estimation Classification Regression Outline Density Estimation 1 Bins Kernel Estimators k-Nearest Neighbor Multivariate Data Classification 2 Regression 3 2

  3. Density Estimation Classification Regression Nonparametric Methods When we cannot make assumptions about the distribution of the data 3

  4. Density Estimation Classification Regression Nonparametric Methods When we cannot make assumptions about the distribution of the data But want to apply methods similar to the ones we have already learned 3

  5. Density Estimation Classification Regression Nonparametric Methods When we cannot make assumptions about the distribution of the data But want to apply methods similar to the ones we have already learned Assumption: Similar inputs have similar outputs 3

  6. Density Estimation Classification Regression Nonparametric Methods When we cannot make assumptions about the distribution of the data But want to apply methods similar to the ones we have already learned Assumption: Similar inputs have similar outputs Secondary assumption: Key function (e.g., pdf, discriminants) change smoothly 3

  7. Density Estimation Classification Regression Density Estimation Given a training set X , can we estimate the sample distribution from the data itself? Trick will be coming up with useful summaries that do not require us to retain the entire training set after training. 4

  8. Density Estimation Classification Regression Bins Divide data into bins of size h p ( x ) = # { x t in same bin as x } Histogram: ˆ Nh Naive Estimator: Solves problems of origin and exact placement of bin boundaries p ( x ) = # { x − h < x t ≤ x + h } ˆ 2 Nh This can be rewritten as N � x − x t � p ( x ) = 1 � ˆ w Nh h t =1 where w is a weight function � 1 / 2 if | u | < 1 w ( u ) = 0 otherwise 5

  9. Density Estimation Classification Regression Histogram 6

  10. Density Estimation Classification Regression Naive Estimator 7

  11. Density Estimation Classification Regression Kernel Estimators We can generalize the idea of the weighting function to Kernel function, e.g., Gaussian kernel 1 e − u 2 / 2 √ K ( u ) = 2 π Kernel estimator (a.k.a. Parzen windows) N � x − x t p ( x ) = 1 � � ˆ K Nh h t =1 8

  12. Density Estimation Classification Regression Kernels 9

  13. Density Estimation Classification Regression k-Nearest Neighbor Estimator Instead of fixing the bin width h and counting the number of neighboring instances, fix the number of neighbors and compute the bin width k p ( x ) = ˆ 2 Nd k ( x ) where d k ( x ) is the distnace to the k th closest instance to x 10

  14. Density Estimation Classification Regression k-Nearest 11

  15. Density Estimation Classification Regression Multivariate Data Kernel estimator N x t 1 � � x − � � � p ( � ˆ x ) = K Nh d h t =1 Multivariate Gaussian kernel (spherical) d u || 2 1 � −|| � � K ( u ) = √ exp 2 2 π Multivariate Gaussian kernel (ellipsoid) 1 � − 1 � u T S − 1 / 2 � K ( u ) = (2 π ) d / 2 | S | 1 / 2 exp 2 � u 12

  16. Density Estimation Classification Regression Potential Problems As number of dimensions rises, # of “bins” explodes Data must be similarly scaled if idea of “distance” is to remain reasonable 13

  17. Density Estimation Classification Regression Classification Estimate p ( � x | C i ) and use Bayes’ rule 14

  18. Density Estimation Classification Regression Classification - Kernel Estimator N x t 1 � � x − � � � r t p ( � ˆ x | C i ) = K i N i h d h t =1 P ( C i ) = N i ˆ N x | C i )ˆ g i ( � x ) = ˆ p ( � P ( C i ) N x t 1 � � x − � � � r t = K i Nh d h t =1 15

  19. Density Estimation Classification Regression Classification - k-NN Estimator k i ˆ p ( � x | C i ) = N i V k ( � x ) where V k ( � x ) is the volume of the smallest (hyper)sphere containing � x and its nearest k neighbors P ( C i ) = N i ˆ N x | C i )ˆ ˆ p ( � P ( C i ) ˆ ( C i | � x ) = ˆ p ( � x ) k i = k Assign the input to the class having the most instances among the k nearest neighbors of � x . 16

  20. Density Estimation Classification Regression Condensed Nearest Neighbor 1-NN is easy to compute Discriminant is piecewise linear But requires that we keep the entire training set Condensed NN discards “interior” points that cannot affect the discriminant Finding such consistent subsets is NP Requires approximation in practice 17

  21. Density Estimation Classification Regression Nonparametric Regression (Smoothing) Extending the idea of a histogram to regression t b ( x , x t ) r t � ˆ g ( x ) = � t b ( x , x t ) where � 1 if x is in the same bin as x b ( x , x t ) = o otherwise “regressogram” 18

  22. Density Estimation Classification Regression Bin Smoothing 19

  23. Density Estimation Classification Regression Running Mean Smoother Define a “bin” centered on x : � � x − x t r t � t w h g ( x ) = ˆ � x − x t � � t w h where � 1 if | u | < 1 w ( u ) = o otherwise particularly popular with evenly spaced data 20

  24. Density Estimation Classification Regression Running Mean Smoothing 21

  25. Density Estimation Classification Regression Kernel Smoother � � x − x t r t � t K h g ( x ) = ˆ � x − x t � � t K h where K is Gaussian In this, and subsequent smoothers, can also reformulate in terms of closest k neighbors 22

  26. Density Estimation Classification Regression Kernel Smoothing 23

  27. Density Estimation Classification Regression Running Line Smoother In running mean we took an average over all points in a bin Instead we could fit a linear regression line to all points in a bin Numerical analysis has spline techniques that smooth derivatives as well as function values 24

  28. Density Estimation Classification Regression Running Line Smoothing 25

  29. Density Estimation Classification Regression Choosing h or k Small values exaggerate effects of single instances - high variance Larger values increases bias Cross-validation 26

Recommend


More recommend