Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. - PowerPoint PPT Presentation

Kernel Density Estimation Nearest-neighbour Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5

Kernel Density Estimation Nearest-neighbour Outline Kernel Density Estimation Nearest-neighbour • These are non-parametric methods • Rather than having a fixed set of parameters (e.g. weight vector for regression, µ, Σ for Gaussian) we have a possibly infinite set of parameters based on each data point • Fundamental Distinction in Machine Learning: • Model-Based, Parametric. What’s the rule, law, pattern? • Instance-Based, non-parametric. What have I seen before that’s similar?

Kernel Density Estimation Nearest-neighbour Histograms • Consider the problem of modelling the distribution of brightness values in pictures taken on sunny days versus cloudy days • We could build histograms of pixel values for each class

Kernel Density Estimation Nearest-neighbour Histograms • E.g. for sunny days 5 �✂✁☎✄✝✆ ✄✟✞ • Count n i number of datapoints (pixels) 0 0 0.5 1 with brightness value falling into each 5 �✂✁☎✄✝✆ ✄✟✠ n i bin: p i = N ∆ i 0 • Sensitive to bin width ∆ i 0 0.5 1 5 �✂✁☎✄✝✆ ✡✟☛ • Discontinuous due to bin edges 0 • In D -dim space with M bins per 0 0.5 1 dimension, M D bins

Kernel Density Estimation Nearest-neighbour Local Density Estimation • In a histogram we use nearby points to estimate density. • For a small region around x , estimate density as: p ( x ) = K NV • K is number of points in region, V is volume of region, N is total number of datapoints • Basic Principle: high probability of x ⇐ ⇒ x is close to many points.

Kernel Density Estimation Nearest-neighbour Kernel Density Estimation • Try to keep idea of using nearby points to estimate density, but obtain smoother estimate • Estimate density by placing a small bump at each datapoint • Kernel function k ( · ) determines shape of these bumps • Density estimate is N � x − x n � p ( x ) ∝ 1 � k N h n = 1

✡ Kernel Density Estimation Nearest-neighbour Kernel Density Estimation 5 �✂✁☎✄✝✆ ✄✞✄✞✟ 0 0 0.5 1 5 �✂✁☎✄✝✆ ✄✞✠ 0 0 0.5 1 5 �✂✁☎✄✝✆ 0 0 0.5 1 • Example using Gaussian kernel: N −|| x − x n || 2 � � p ( x ) = 1 1 � ( 2 π h 2 ) 1 / 2 exp 2 h 2 N n = 1

Kernel Density Estimation Nearest-neighbour Kernel Density Estimation 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 � 3 � 2 � 1 0 1 2 3 • Other kernels: Rectangle , Triangle, Epanechnikov

Kernel Density Estimation Nearest-neighbour Kernel Density Estimation 0.14 0.12 0.1 1 0.9 0.08 0.8 0.7 0.06 0.6 0.5 0.04 0.4 0.3 0.02 0.2 0.1 0 0 � 3 � 2 � 1 0 1 2 3 � 5 0 5 10 15 20 25 30 • Other kernels: Rectangle , Triangle, Epanechnikov

Kernel Density Estimation Nearest-neighbour Kernel Density Estimation 0.14 0.12 0.1 1 0.9 0.08 0.8 0.7 0.06 0.6 0.5 0.04 0.4 0.3 0.02 0.2 0.1 0 0 � 3 � 2 � 1 0 1 2 3 � 5 0 5 10 15 20 25 30 • Other kernels: Rectangle , Triangle, Epanechnikov • Fast at training time, slow at test time – keep all datapoints

Kernel Density Estimation Nearest-neighbour Kernel Density Estimation 0.14 0.12 0.1 1 0.9 0.08 0.8 0.7 0.06 0.6 0.5 0.04 0.4 0.3 0.02 0.2 0.1 0 0 � 3 � 2 � 1 0 1 2 3 � 5 0 5 10 15 20 25 30 • Other kernels: Rectangle , Triangle, Epanechnikov • Fast at training time, slow at test time – keep all datapoints • Sensitive to kernel bandwidth h

Kernel Density Estimation Nearest-neighbour Nearest-neighbour 5 �✂✁☎✄ 0 0 0.5 1 5 �✂✁☎✆ 0 0 0.5 1 5 �✂✁☎✝✟✞ 0 0 0.5 1 • Instead of relying on kernel bandwidth to get proper density estimate, fix number of nearby points K : p ( x ) = K NV • Note: diverges, not proper density estimate

Kernel Density Estimation Nearest-neighbour Nearest-neighbour for Classification • K Nearest neighbour is often used for classification • Classification: predict labels t i from x i

Kernel Density Estimation Nearest-neighbour Nearest-neighbour for Classification x 2 x 1 (a) • K Nearest neighbour is often used for classification • Classification: predict labels t i from x i • e.g. x i ∈ R 2 and t i ∈ { 0 , 1 } , 3-nearest neighbour

Kernel Density Estimation Nearest-neighbour Nearest-neighbour for Classification x 2 x 2 x 1 x 1 (a) (b) • K Nearest neighbour is often used for classification • Classification: predict labels t i from x i • e.g. x i ∈ R 2 and t i ∈ { 0 , 1 } , 3-nearest neighbour • K = 1 referred to as nearest-neighbour

Kernel Density Estimation Nearest-neighbour Nearest-neighbour for Classification • Good baseline method • Slow, but can use fancy datastructures for efficiency (KD-trees, Locality Sensitive Hashing) • Nice theoretical properties • As we obtain more training data points, space becomes more filled with labelled data • As N → ∞ error no more than twice Bayes error

Kernel Density Estimation Nearest-neighbour Conclusion • Readings: Ch. 2.5 • Kernel density estimation • Model density p ( x ) using kernels around training datapoints • Nearest neighbour • Model density or perform classification using nearest training datapoints • Multivariate Gaussian • Needed for next week’s lectures, if you need a refresher read pp. 78-81

Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. - PowerPoint PPT Presentation

Kernel Density Estimation Nearest-neighbour Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5 Kernel Density Estimation Nearest-neighbour Outline Kernel Density Estimation Nearest-neighbour These are non-parametric

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Towards a non-parametric Towards a non-parametric stochastic framework: a consistent approach of

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

Non-Parametric Methods and Support Vector Machines Shan-Hung Wu shwu@cs.nthu.edu.tw Department

Non-Parametric Methods; Simulations March 6, 2020 Data Science CSCI 1951A Brown University

TCTL model checking lower/upper-bound Introduction parametric timed automata without Parametric

CMSC427 Notes on piecewise parametric curves: Hermite, Catmull-Rom, and Bezier I. Parametric

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Distributions Estimating

Dose-response evaluation using a combined parametric/non-parametric approach John-Philip Lawo

Learning From Data Lecture 18 Radial Basis Functions Non-Parametric RBF Parametric RBF k

Quantifying Surface Brightness Quantifying SB profiles Non-Parametric Parametric CSB : 0

via Threshold-Based Pruning Edward Gan & Peter Bailis 1 MacroBase: Analytics on Fast Streams

Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension

Draft 1 Density estimation by Randomized Quasi-Monte Carlo Pierre LEcuyer Joint work with

Understanding the Structure of Programs is Difficult Software Clustering Developers create

Notes and Announcements Midterm exam: Oct 20 , Wednesday, In Class Late Homeworks Turn

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Sub-quadratic Markov tree mixture models for probability density estimation Sourour Ammar 1 , Ph.

Extended Path Integral Formulation for Volumetric Transport T. Hachisuka I. Georgiev W. Jarosz

Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. - PowerPoint PPT Presentation

Kernel Density Estimation Nearest-neighbour Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5 Kernel Density Estimation Nearest-neighbour Outline Kernel Density Estimation Nearest-neighbour These are non-parametric

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Towards a non-parametric Towards a non-parametric stochastic framework: a consistent approach of

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

Non-Parametric Methods and Support Vector Machines Shan-Hung Wu shwu@cs.nthu.edu.tw Department

Non-Parametric Methods; Simulations March 6, 2020 Data Science CSCI 1951A Brown University

TCTL model checking lower/upper-bound Introduction parametric timed automata without Parametric

CMSC427 Notes on piecewise parametric curves: Hermite, Catmull-Rom, and Bezier I. Parametric

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Distributions Estimating

Dose-response evaluation using a combined parametric/non-parametric approach John-Philip Lawo

Learning From Data Lecture 18 Radial Basis Functions Non-Parametric RBF Parametric RBF k

Quantifying Surface Brightness Quantifying SB profiles Non-Parametric Parametric CSB : 0

via Threshold-Based Pruning Edward Gan &amp; Peter Bailis 1 MacroBase: Analytics on Fast Streams

Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension

Draft 1 Density estimation by Randomized Quasi-Monte Carlo Pierre LEcuyer Joint work with

Understanding the Structure of Programs is Difficult Software Clustering Developers create

Notes and Announcements Midterm exam: Oct 20 , Wednesday, In Class Late Homeworks Turn

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Sub-quadratic Markov tree mixture models for probability density estimation Sourour Ammar 1 , Ph.

Extended Path Integral Formulation for Volumetric Transport T. Hachisuka I. Georgiev W. Jarosz

via Threshold-Based Pruning Edward Gan & Peter Bailis 1 MacroBase: Analytics on Fast Streams