Applied Machine Learning Applied Machine Learning Multilayer Perceptron Siamak Ravanbakhsh Siamak Ravanbakhsh COMP 551 COMP 551 (winter 2020) (winter 2020) 1
Learning objectives Learning objectives multilayer percepron: model different supervised learning tasks activation functions architecture of a neural network its expressive power regularization techniques 2
Adaptive bases Adaptive bases several methods can be classified as learning these bases adaptively f ( x ) = ( x ; v ) ∑ d w ϕ d d d decision trees generalized additive models boosting neural networks 3
Adaptive bases Adaptive bases several methods can be classified as learning these bases adaptively f ( x ) = ( x ; v ) ∑ d w ϕ d d d decision trees generalized additive models boosting neural networks consider the adaptive bases in a general form (contrast to decision trees) 3
Adaptive bases Adaptive bases several methods can be classified as learning these bases adaptively f ( x ) = ( x ; v ) ∑ d w ϕ d d d decision trees generalized additive models boosting neural networks consider the adaptive bases in a general form (contrast to decision trees) use gradient descent to find good parameters (contrast to boosting) 3
Adaptive bases Adaptive bases several methods can be classified as learning these bases adaptively f ( x ) = ( x ; v ) ∑ d w ϕ d d d decision trees generalized additive models boosting neural networks consider the adaptive bases in a general form (contrast to decision trees) use gradient descent to find good parameters (contrast to boosting) create more complex adaptive bases by combining simpler bases leads to deep neural networks 3
Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case model: f ( x ; w ) = ( x ) ∑ d w ϕ d d 4 . 1
Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case model: f ( x ; w ) = ( x ) ∑ d w ϕ d d 1 ∑ n cost: J ( w ) = ( n ) ( n ) 2 ( f ( x ; w ) − ) y 2 4 . 1
Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case 1 #x: N 2 #y: N model: f ( x ; w ) = ( x ) ∑ d w ϕ 3 plt.plot(x, y, 'b.') d d 4 phi = lambda x,mu: np.exp(-(x-mu)**2) 1 ∑ n cost: J ( w ) = ( n ) ( n ) 2 the center are fixed 5 mu = np.linspace(0,4,10) #4 Gaussians bases ( f ( x ; w ) − ) y 2 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 7 w = np.linalg.lstsq(Phi, y)[0] the model is linear in its parameters 8 yh = np.dot(Phi,w) 9 plt.plot(x, yh, 'g-') the cost is convex in w (unique minimum) even has a closed form solution 4 . 1
Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case 1 #x: N 2 #y: N model: f ( x ; w ) = ( x ) ∑ d w ϕ 3 plt.plot(x, y, 'b.') d d 4 phi = lambda x,mu: np.exp(-(x-mu)**2) 1 ∑ n cost: J ( w ) = ( n ) ( n ) 2 the center are fixed 5 mu = np.linspace(0,4,10) #4 Gaussians bases ( f ( x ; w ) − ) y 2 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 7 w = np.linalg.lstsq(Phi, y)[0] the model is linear in its parameters 8 yh = np.dot(Phi,w) 9 plt.plot(x, yh, 'g-') the cost is convex in w (unique minimum) even has a closed form solution 4 . 1
Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case 1 #x: N 2 #y: N model: f ( x ; w ) = ( x ) ∑ d w ϕ 3 plt.plot(x, y, 'b.') d d 4 phi = lambda x,mu: np.exp(-(x-mu)**2) 1 ∑ n cost: J ( w ) = ( n ) ( n ) 2 the center are fixed 5 mu = np.linspace(0,4,10) #4 Gaussians bases ( f ( x ; w ) − ) y 2 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 7 w = np.linalg.lstsq(Phi, y)[0] the model is linear in its parameters 8 yh = np.dot(Phi,w) 9 plt.plot(x, yh, 'g-') the cost is convex in w (unique minimum) even has a closed form solution adaptive case we can make the bases adaptive by learning these centers model: f ( x ; w , μ ) = ∑ d ( x ; μ ) w ϕ d d d how to minimize the cost? 4 . 1
Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case 1 #x: N 2 #y: N model: f ( x ; w ) = ( x ) ∑ d w ϕ 3 plt.plot(x, y, 'b.') d d 4 phi = lambda x,mu: np.exp(-(x-mu)**2) 1 ∑ n cost: J ( w ) = ( n ) ( n ) 2 the center are fixed 5 mu = np.linspace(0,4,10) #4 Gaussians bases ( f ( x ; w ) − ) y 2 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 7 w = np.linalg.lstsq(Phi, y)[0] the model is linear in its parameters 8 yh = np.dot(Phi,w) 9 plt.plot(x, yh, 'g-') the cost is convex in w (unique minimum) even has a closed form solution adaptive case we can make the bases adaptive by learning these centers model: f ( x ; w , μ ) = ∑ d ( x ; μ ) w ϕ d d d how to minimize the cost? not convex in all model parameters use gradient descent to find a local minimum 4 . 1
Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case 1 #x: N 2 #y: N model: f ( x ; w ) = ( x ) ∑ d w ϕ 3 plt.plot(x, y, 'b.') d d 4 phi = lambda x,mu: np.exp(-(x-mu)**2) 1 ∑ n cost: J ( w ) = ( n ) ( n ) 2 the center are fixed 5 mu = np.linspace(0,4,10) #4 Gaussians bases ( f ( x ; w ) − ) y 2 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 7 w = np.linalg.lstsq(Phi, y)[0] the model is linear in its parameters 8 yh = np.dot(Phi,w) 9 plt.plot(x, yh, 'g-') the cost is convex in w (unique minimum) even has a closed form solution adaptive case we can make the bases adaptive by learning these centers model: f ( x ; w , μ ) = ∑ d ( x ; μ ) w ϕ d d d how to minimize the cost? not convex in all model parameters use gradient descent to find a local minimum note that the basis centers are adaptively changing 4 . 1
Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ d −( ) 1+ e s d using adaptive sigmoid bases gives us a neural network non-adaptive case μ d = 1 s d 4 . 2
Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ d −( ) 1+ e s d using adaptive sigmoid bases gives us a neural network non-adaptive case is fixed to D locations μ d = 1 s d 4 . 2
Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ d −( ) 1+ e s d using adaptive sigmoid bases gives us a neural network 1 #x: N ^ y 2 #y: N 3 plt.plot(x, y, 'b.') non-adaptive case 4 phi = lambda x,mu,sigma: 1/(1 + np.exp(-(x - mu))) 5 mu = np.linspace(0,3,10) is fixed to D locations μ w w 1 d D 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 w 2 ... = 1 7 w = np.linalg.lstsq(Phi, y)[0] s d 8 yh = np.dot(Phi,w) model: f ( x ; w ) = ( x ) ∑ d w ϕ 9 plt.plot(x, yh, 'g-') d d ( x ) ( x ) ( x ) ϕ ϕ ϕ 1 2 D 4 . 2
Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ d −( ) 1+ e s d using adaptive sigmoid bases gives us a neural network 1 #x: N ^ y 2 #y: N 3 plt.plot(x, y, 'b.') non-adaptive case 4 phi = lambda x,mu,sigma: 1/(1 + np.exp(-(x - mu))) 5 mu = np.linspace(0,3,10) is fixed to D locations μ w w 1 d D 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 w 2 ... = 1 7 w = np.linalg.lstsq(Phi, y)[0] s d 8 yh = np.dot(Phi,w) model: f ( x ; w ) = ( x ) ∑ d w ϕ 9 plt.plot(x, yh, 'g-') d d ( x ) ( x ) ( x ) ϕ ϕ ϕ 1 2 D D=10 4 . 2
Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ d −( ) 1+ e s d using adaptive sigmoid bases gives us a neural network 1 #x: N ^ y 2 #y: N 3 plt.plot(x, y, 'b.') non-adaptive case 4 phi = lambda x,mu,sigma: 1/(1 + np.exp(-(x - mu))) 5 mu = np.linspace(0,3,10) is fixed to D locations μ w w 1 d D 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 w 2 ... = 1 7 w = np.linalg.lstsq(Phi, y)[0] s d 8 yh = np.dot(Phi,w) model: f ( x ; w ) = ( x ) ∑ d w ϕ 9 plt.plot(x, yh, 'g-') d d ( x ) ( x ) ( x ) ϕ ϕ ϕ 1 2 D D=10 D=5 D=3 4 . 2
Adaptive Adaptive Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ −( d ) 1+ e s d rewrite the sigmoid basis x − μ ( x ) = σ ( ) = σ ( v x + ) ϕ d b d d d s d 4 . 3
Adaptive Adaptive Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ −( d ) 1+ e s d rewrite the sigmoid basis x − μ ( x ) = σ ( ) = σ ( v x + ) ϕ d b d d d s d ⊤ ( x ) = σ ( v x + ) each basis is the logistic regression model ϕ b d d d assuming input is higher than one dimension 4 . 3
Adaptive Adaptive Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ −( d ) 1+ e s d rewrite the sigmoid basis x − μ ( x ) = σ ( ) = σ ( v x + ) ϕ d b d d d s d ⊤ ( x ) = σ ( v x + ) each basis is the logistic regression model ϕ b d d d assuming input is higher than one dimension model: f ( x ; w , v , b ) = σ ( v x + ) ∑ d w b d d d 4 . 3
Recommend
More recommend