Lecture 14: Local linear regression non-parametric estimation, - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . . Lecture 14: Local linear regression non-parametric estimation, perceptron and update algo, etc Instructor: Prof. Ganesh Ramakrishnan March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . . 1 / 16

. . . . . . . . . . . . . . . Basis function expansion & Kernel: Part 1 following can be equivalent representations p And m these dual representations hold? 1 1 Section 5.8.1 of Tibshi. March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . 2 / 16 We saw the that for p ∈ [0 , ∞ ) , under certain conditions on K , the ∑ f ( x ) = w j φ j ( x ) j =1 ∑ f ( x ) = α i K ( x , x i ) i =1 For what kind of regularizers, loss functions and p ∈ [0 , ∞ ) will

. . . . . . . . . . . . . . Basis function expansion & Kernel: Part 2 We could also begin with m and impose no constraints on K . indicator of the set S This is precisely the Nearest Neighbor Regression model Kernel regression and density models are other examples of such local regression methods 2 2 Section 2.8.2 of Tibshi March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . . 3 / 16 . . . ∑ f ( x ) = α i K ( x , x i ) i =1 E.g.: K k ( x q , x ) = I ( || x q − x || ≤ || x ( k ) − x 0 || ) where x ( k ) is the training observation ranked k th in distance from x and I ( S ) is the

. . . . . . . . . . . . Kernel weighted regression . , we predict a n 1 2 How does this model compare with linear regression and advantages and disadvantages of this model? 3 try and interpret what this regression model would look like, say 3 Hint: What would the regression function look like at each training data point? March 1, 2016 . . . . . . . . . . . . . . . 4 / 16 . . . . . . . . . . . . Weights obtained using some kernel K ( ., . ) . Given a training set of { } points D = ( x 1 , y 1 ) , . . . , ( x i , y i ) , . . . , ( x n , y n ) regression function f ( x ′ ) = ( w ⊤ φ ( x ′ ) + b ) for each test (or query point) x ′ as follows: ) 2 ( ∑ ( w ′ , b ′ ) = argmin K ( x ′ , x i ) y i − ( w ⊤ φ ( x i ) + b ) i =1 w , b If there is a closed form expression for ( w ′ , b ′ ) and therefore for f ( x ′ ) in terms of the known quantities, derive it. k − nearest neighbor regression? What are the relative In the one dimensional case (that is when φ ( x ) ∈ ℜ ), graphically when K ( ., . ) is the linear kernel 3 .

. . . . . . . . . . . . . . . . More on Kernels after some classifjcation 1 We will delve a bit more into kernel density estimation etc after some treatment of classifjcation March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . . . 5 / 16

. . . . . . . . . . . . . . . . . . Perceptron March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . 6 / 16

. . . . . . . . . . . . . . . . R m Assuming the problem is linearly separable, there is a learning rule that converges in a fjnite time. A new (unseen) input pattern that is similar to an old (seen) input pattern is likely to be classifjed correctly March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . . . 7 / 16 w ⊤ φ ( x ) + b ≥ 0 for +ve points (y= +1) w ⊤ φ ( x ) + b < 0 for -ve points (y= -1) w , φ ∈ I

. . . . . . . . . . . . . . . . w m b . . . March 1, 2016 . 8 / 16 . . . . . . . . . . . . . . . . . . . . . . . Often, b is indirectly captured by including it in w, and using a φ as: φ aug = [ φ, 1] Thus, w ⊤ φ ( x )   φ 1 φ 2     φ 3 [ ]   = . . .   w 1 w 2 w 3       φ m   1 w ⊤ φ ( x ) = 0 is the separating hyperplane.

. . . . . . . . . . . . . . . . Perceptron Intuition Go over all the existing examples, whose class is known, and check their classifjcation with a current weight vector If correct, continue If not, add to the weights a quantity that is proportional to the March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . . . 9 / 16 product of the input pattern with the desired output y ( 1 or − 1 )

. . . . . . . . . . . . . . . . . Perceptron Update Rule (for every example), do: March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . . 10 / 16 Start with some weight vector w (0) , and for k = 1 , 2 , 3 , . . . , n w ( k ) = w ( k − 1) + Γ φ ( x ′ ) where x ′ s.t. x ′ is misclassifjed by ( w ( k ) ) ⊤ φ ( x ) i.e. y ′ ( w ( k ) ) ⊤ φ ( x ′ ) < 0

. . . . . . . . . . . . . . . . . . March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . 11 / 16

. . . . . . . . . . . . . . . . . . March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . 12 / 16

. . . . . . . . . . . . . . . . . . March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . 13 / 16

. . . . . . . . . . . . . . . . . . March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . 14 / 16

. . . . . . . . . . . . . . . . . . March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . 15 / 16

. . . . . . . . . . . . . . . Perceptron does not fjnd the best seperating hyperplane, it fjnds any seperating hyperplane. In case the initial w does not classify all the examples, the pass through an example. The seperating hyperplane does not provide enough breathing space – this is what SVMs address! March 1, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . 16 / 16 seperating hyperplane corresponding to the fjnal w ∗ will often

Lecture 14: Local linear regression non-parametric estimation, - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . . Lecture 14: Local linear regression non-parametric estimation, perceptron and update algo, etc Instructor: Prof. Ganesh Ramakrishnan March 1, 2016 . . . . . . . . . . . . . . . .

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

Introduction to Machine Learning Non-linear prediction with kernels Prof. Andreas Krause

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

Regression: Simple and Linear Introduction to Machine Learning Regression Principle REGRESSION

Explainable(?) Statistical ML Derek Doran Dept. of Computer Science and Engineering Wright

50 Ways with GPs Richard Wilkinson School of Maths and Statistics University of Sheffield

CPSC 340: Machine Learning and Data Mining Non-Parametric Models Summer 2020 Course Map

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model Aria Haghighi and Dan Klein

Handling parametric and non-parametric additive faults in LTV Systems Qinghua Zhang &

Lecture 14: Local linear regression non-parametric estimation, - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . . Lecture 14: Local linear regression non-parametric estimation, perceptron and update algo, etc Instructor: Prof. Ganesh Ramakrishnan March 1, 2016 . . . . . . . . . . . . . . . .

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

Introduction to Machine Learning Non-linear prediction with kernels Prof. Andreas Krause

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

Regression: Simple and Linear Introduction to Machine Learning Regression Principle REGRESSION

Explainable(?) Statistical ML Derek Doran Dept. of Computer Science and Engineering Wright

50 Ways with GPs Richard Wilkinson School of Maths and Statistics University of Sheffield

CPSC 340: Machine Learning and Data Mining Non-Parametric Models Summer 2020 Course Map

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model Aria Haghighi and Dan Klein

Handling parametric and non-parametric additive faults in LTV Systems Qinghua Zhang &amp;

Handling parametric and non-parametric additive faults in LTV Systems Qinghua Zhang &