Least Mean Squares Regression Machine Learning 1
Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 2
Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 3
What’s the mileage? Suppose we want to predict the mileage of a car from its weight and age What we want: A function that can Weight Age (x 100 lb) (years) Mileage predict mileage x 1 x 2 using x 1 and x 2 31.5 6 21 36.2 2 25 43.1 0 18 27.6 2 30 4
Linear regression: The strategy Predicting continuous values using a linear model Assumption: The output is a linear function of the inputs Mileage = w 0 + w 1 x 1 + w 2 x 2 Learning: Using the training data to find the best possible value of w Prediction: Given the values for x 1 , x 2 for a new car, use the learned w to predict the Mileage for the new car 5
Linear regression: The strategy Predicting continuous values using a linear model Assumption: The output is a linear function of the inputs Mileage = w 0 + w 1 x 1 + w 2 x 2 Parameters of the model Also called weights Collectively, a vector Learning: Using the training data to find the best possible value of w Prediction: Given the values for x 1 , x 2 for a new car, use the learned w to predict the Mileage for the new car 6
Linear regression: The strategy For simplicity, we will assume • Inputs are vectors: 𝐲 ∈ ℜ ! that the first feature is always 1. • Outputs are real numbers: 𝑧 ∈ ℜ 1 𝑦 ! 𝒚 = ⋮ • We have a training set 𝑦 " D = { 𝐲 ! , 𝑧 ! , 𝐲 " , 𝑧 " , ⋯ } This lets makes notation easier • We want to approximate 𝑧 as 𝑧 = 𝑥 ! + 𝑥 " 𝑦 " + ⋯ + 𝑥 # 𝑦 # 𝑧 = 𝐱 $ 𝐲 𝐱 is the learned weight vector in ℜ # 7
Examples y x 1 One dimensional input 8
Examples y Predict using y = w 1 + w 2 x 2 x 1 One dimensional input 9
Examples y Predict using y = w 1 + w 2 x 2 The linear function is not our only choice. We could have tried to fit the data as another polynomial x 1 One dimensional input 10
Examples y Predict using y = w 1 + w 2 x 2 The linear function is not our only choice. We could have tried to fit the data as another polynomial x 1 One dimensional input Two dimensional input Predict using y = w 1 + w 2 x 2 +w 3 x 3 11
Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 12
What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 13
What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 14
What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be One strategy for learning: Find the w with least cost on this data 15
What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 16
What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 17
Least Mean Squares (LMS) Regression Learning: minimizing mean squared error 18
Least Mean Squares (LMS) Regression Learning: minimizing mean squared error Different strategies exist for learning by optimization • Gradient descent is a popular algorithm (For this particular minimization objective, there is also an analytical solution. No need for gradient descent) 19
Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 20
We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 21
We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 22
We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 23
We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 2 w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 24
We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 3 w 2 w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 25
We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 3 w 2 w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 26
We are trying to minimize Gradient descent for LMS 1. Initialize w 0 2. For t = 0, 1, 2, …. t ) 1. Compute gradient of J( w ) at w t . Call it r J( w 2. Update w as follows: r : Called the learning rate (For now, a small constant. We will get to this later) 27
We are trying to minimize Gradient descent for LMS 1. Initialize w 0 2. For t = 0, 1, 2, …. What is the gradient of J? t ) 1. Compute gradient of J( w ) at w t . Call it r J( w 2. Update w as follows: r : Called the learning rate (For now, a small constant. We will get to this later) 28
We are trying to minimize Gradient of the cost • The gradient is of the form • Remember that w is a vector with d elements – w = [w 1 , w 2 , w 3 , ! w j , ! , w d ] 29
We are trying to minimize Gradient of the cost • The gradient is of the form 30
We are trying to minimize Gradient of the cost • The gradient is of the form 31
We are trying to minimize Gradient of the cost • The gradient is of the form 32
We are trying to minimize Gradient of the cost • The gradient is of the form 33
We are trying to minimize Gradient of the cost • The gradient is of the form 34
We are trying to minimize Gradient of the cost • The gradient is of the form One element of the gradient vector 35
We are trying to minimize Gradient of the cost • The gradient is of the form One element of the gradient vector Sum of Error × Input 36
Recommend
More recommend