Lecture 24: Perceptrons Regression Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS440/ECE448: Intro to Artificial Intelligence � Lecture 24:   Perceptrons � Regression � Prof. Julia Hockenmaier � juliahmr@illinois.edu � � http://cs.illinois.edu/fa11/cs440 � � � Linear regression � Squared Loss � Given some data {(x,y)…}, with x, y ∈ R, � We want to find a weight vector w which find a function f(x) = w 1 x + w 0 such that f(x) = y. � minimizes the loss (error) on the training data {(x 1 ,y 1 )…(x N , y N )} o N ! L ( w ) = L 2 ( f w ( x i ), y i ) i = 1 o o N ! ) 2 o = ( y i " f w ( x i ) o i = 1 CS440/ECE448: Intro AI � 4 �

Linear regression � Gradient descent � We need to minimize the loss on the training In general, we won ʼ t be able to find a closed- data: w = argmin w Loss(f w ) form solution, so we need an iterative (local � search) algorithm. � We need to set partial derivatives of Loss(f w ) � � with respect to w1, w0 to zero. � We will start with an initial weight vector w, � and update each element iteratively in the This has a closed-form solution for linear direction of its gradient: � w i := w i – ! d/dw i Loss( w ) regression (see book). � CS440/ECE448: Intro AI � 6 � Binary classification   Binary classification � with Naïve Bayes � For each item x = (x 1…. x d ) , we compute � The input x = (x 1…. x d ) ∈ R d is real-valued vector, � f k ( x ) = P( x | C k )P(C k ) = P(C k ) " i P(x i |C k ) We want to learn f( x) . for both class C 1 and C 2 � � + + + + � + + x x f( x ) = 0 + + x 2 + + We assign class C 1 to x if f 1 ( x ) > f 2 ( x ) + + + + x x + + x x x x Equivalently, we can define a ʻ discriminant function ʼ x x x x We assume the f( x ) = f 1 ( x ) - f 2 ( x ) x x x x + + two classes are x x and assign class C 1 to x if f( x ) > 0 x x linearly separable � x 1 CS440/ECE448: Intro AI � 7 � CS440/ECE448: Intro AI � 8 �

Binary classification � Binary classification � The weight vector w defines the orientation of the decision boundary. � The input x = (x 1…. x d ) ∈ R d is real-valued vector � The bias term w 0 defines the perpendicular We want to learn f( x) . distance of the decision boundary to the origin. � � We assume the classes are linearly separable, so � + � + � + � we choose a linear discrimant function: x 2 + � x � + � + � ! w 0 f( x) = w � x + w 0 + � + � w x � + � – w = (w 1…. w d ) ∈ R d is a weight vector � x � – w 0 is a bias term � x � x � x � – -w 0 is also called a threshold: - w 0 = w � x x � x � x � x � w x 1 CS440/ECE448: Intro AI � 10 � Binary classification � Binary classification � Equivalently, redefine Our classification hypothesis then becomes � x = (1, x 1…. x d ) ∈ R d+1 h w ( x ) = 1 if f( x ) = w � x # 0 w = (w 0, w 1…. w d ) ∈ R d+1 � 0 otherwise f( x) = w � x � Define C 1 = 1 C 2 = 0 � We can also think of h w (x) as a threshold function. � � h w ( x ) = Threshold( w � x ), Our classification hypothesis then becomes � h w (x) = 1 if f( x ) = w � x # 0 where Threshold( z ) = 1 if z # 0 0 otherwise 0 otherwise � � �

Learning the weights � Observations � If we classify an item ( x, y) correctly,   We need to choose w to minimize   we don ʼ t need to change w. classification loss. � � If we classify an item ( x, y) incorrectly,   But we cannot compute this in closed form, � there are two cases: � because the gradient of w is either 0 or undefined. � – y = 1 ( above the true decision boundary)   h w ( x) = 0 ( below the true decision boundary)   � We need to move our decision boundary up!   Iterative solution: � � – Start with initial weight vector w. � – y = 0 ( below the true decision boundary)   h w ( x) = 1 ( above the true decision boundary)   – For each example ( x, y) update weights w until all items We need to move our decision boundary down!   are correctly classified. � CS440/ECE448: Intro AI � 13 � CS440/ECE448: Intro AI � 14 � Learning the weights   Learning the weights � (initial attempt) � Evaluating y - h w ( x ) will tell us what to do: Iterative solution: � – h w ( x ) is correct: y - h w ( x ) = 0 (stay!) – Start with initial weight vector w. � – For each example ( x, y) update weights w until all items – If y = 1 , but we predict h w ( x ) = 0 are correctly classified. y - h w ( x) = 1 – 0 = 1 � (move up!) Update rule: � For each example ( x, y) update each weight w i : � – If y = 0 , but we predict h w ( x ) = 1 w i := w i + (y - h w ( x ))x i y - h w ( x) = 0 – 1 = – 1 � (move down!) CS440/ECE448: Intro AI � 15 � CS440/ECE448: Intro AI � 16 �

There is a problem: � Learning the weights � Real data is not perfectly separable. � Observation: � There will be noise, and our features may not be sufficient. � When we ʼ ve only seen a few examples, we want the weights to change a lot. � + + + + � + + x x f( x ) = 0 + + x 2 + + After we ʼ ve seen a lot of examples, we want the + + + + x x weights to change less and less, because we can + + x x x x now classify most examples correctly. � x x x x � x x x x + + x x x x Solution: We need a learning rate which decays over time. � x 1 CS440/ECE448: Intro AI � 17 � CS440/ECE448: Intro AI � 18 � Batch/Epoch   Learning the weights   Perceptron Learning   (Perceptron algorithm) � � � Choose a convergence criterion (#epochs, min | ! w |, …) � Iterative solution: � Choose a learning rate " , an initial w � – Start with initial weight vector w. � Repeat until convergence: � – For each example ( x, y) update weights w until w has � ! w = # x " err x � (sum over training set holding w ) � converged (does not change significantly anymore) � w $ w + ! w (update with accumulated changes) � � � Perceptron update rule ( ʻ online ʼ ): � – For each example ( x, y) update each weight w i :   Now it always converges, regardless of " (will influence w i := w i + ! (y - h w ( x ))x i the rate), and whether or not training points are linearly � – ! decays over time t (t=#examples) e.g ! = n/(n+t) CS440/ECE448: Intro AI � 19 � 20 �

Lecture 24: Perceptrons Regression Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS440/ECE448: Intro to Artificial Intelligence Lecture 24: Perceptrons Regression Prof. Julia Hockenmaier juliahmr@illinois.edu http://cs.illinois.edu/fa11/cs440 Linear regression Squared Loss Given

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

CSC421/2516 Lecture 3: Multilayer Perceptrons Roger Grosse and Jimmy Ba Roger Grosse and Jimmy

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Kalev

Perceptrons Sven Koenig, USC Russell and Norvig, 3 rd Edition, Sections 18.7.1-18.7.4 These

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Algebraic and combinatorial methods for bounding the number of the complex embeddings of

Dynamically Typed Programming Languages Part 2: Dynamic PCF Jim Royer CIS 352 April 16, 2019

Online Collaborative Prediction of Regional Vote Results Vincent Etter, Emtiyaz Khan, Mattias

A General Solver Based on Sparse Resultants Ioannis Z. Emiris presented by Pavel Trutman

Superusers and IT support Learning aim Identify groups supporting in IT use Specify

IT support Learning aim To be able to Support users IT personnel Users Manage IT

Machine Learning (CSE 446): PCA (continued) and Learning as Minimizing Loss Sham M Kakade

Lifting the Cartier transform of Ogus and Vologodsky modulo p n [following H. Oyama, A. Shiho and

Lecture 24: Perceptrons Regression Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS440/ECE448: Intro to Artificial Intelligence Lecture 24: Perceptrons Regression Prof. Julia Hockenmaier juliahmr@illinois.edu http://cs.illinois.edu/fa11/cs440 Linear regression Squared Loss Given

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

CSC421/2516 Lecture 3: Multilayer Perceptrons Roger Grosse and Jimmy Ba Roger Grosse and Jimmy

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Machine Learning and Data Mining Multi-layer Perceptrons &amp; Neural Networks: Basics Kalev

Perceptrons Sven Koenig, USC Russell and Norvig, 3 rd Edition, Sections 18.7.1-18.7.4 These

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Algebraic and combinatorial methods for bounding the number of the complex embeddings of

Dynamically Typed Programming Languages Part 2: Dynamic PCF Jim Royer CIS 352 April 16, 2019

Online Collaborative Prediction of Regional Vote Results Vincent Etter, Emtiyaz Khan, Mattias

A General Solver Based on Sparse Resultants Ioannis Z. Emiris presented by Pavel Trutman

Superusers and IT support Learning aim Identify groups supporting in IT use Specify

IT support Learning aim To be able to Support users IT personnel Users Manage IT

Machine Learning (CSE 446): PCA (continued) and Learning as Minimizing Loss Sham M Kakade

Lifting the Cartier transform of Ogus and Vologodsky modulo p n [following H. Oyama, A. Shiho and

Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Kalev