Applied Machine Learning Applied Machine Learning Multilayer - PowerPoint PPT Presentation

Applied Machine Learning Applied Machine Learning Multilayer Perceptron Siamak Ravanbakhsh Siamak Ravanbakhsh COMP 551 COMP 551 (winter 2020) (winter 2020) 1

Learning objectives Learning objectives multilayer percepron: model different supervised learning tasks activation functions architecture of a neural network its expressive power regularization techniques 2

Adaptive bases Adaptive bases several methods can be classified as learning these bases adaptively f ( x ) = ( x ; v ) ∑ d w ϕ d d d decision trees generalized additive models boosting neural networks 3

Adaptive bases Adaptive bases several methods can be classified as learning these bases adaptively f ( x ) = ( x ; v ) ∑ d w ϕ d d d decision trees generalized additive models boosting neural networks consider the adaptive bases in a general form (contrast to decision trees) 3

Adaptive bases Adaptive bases several methods can be classified as learning these bases adaptively f ( x ) = ( x ; v ) ∑ d w ϕ d d d decision trees generalized additive models boosting neural networks consider the adaptive bases in a general form (contrast to decision trees) use gradient descent to find good parameters (contrast to boosting) 3

Adaptive bases Adaptive bases several methods can be classified as learning these bases adaptively f ( x ) = ( x ; v ) ∑ d w ϕ d d d decision trees generalized additive models boosting neural networks consider the adaptive bases in a general form (contrast to decision trees) use gradient descent to find good parameters (contrast to boosting) create more complex adaptive bases by combining simpler bases leads to deep neural networks 3

Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case model: f ( x ; w ) = ( x ) ∑ d w ϕ d d 4 . 1

Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case model: f ( x ; w ) = ( x ) ∑ d w ϕ d d 1 ∑ n cost: J ( w ) = ( n ) ( n ) 2 ( f ( x ; w ) − ) y 2 4 . 1

Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case 1 #x: N 2 #y: N model: f ( x ; w ) = ( x ) ∑ d w ϕ 3 plt.plot(x, y, 'b.') d d 4 phi = lambda x,mu: np.exp(-(x-mu)**2) 1 ∑ n cost: J ( w ) = ( n ) ( n ) 2 the center are fixed 5 mu = np.linspace(0,4,10) #4 Gaussians bases ( f ( x ; w ) − ) y 2 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 7 w = np.linalg.lstsq(Phi, y)[0] the model is linear in its parameters 8 yh = np.dot(Phi,w) 9 plt.plot(x, yh, 'g-') the cost is convex in w (unique minimum) even has a closed form solution 4 . 1

Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case 1 #x: N 2 #y: N model: f ( x ; w ) = ( x ) ∑ d w ϕ 3 plt.plot(x, y, 'b.') d d 4 phi = lambda x,mu: np.exp(-(x-mu)**2) 1 ∑ n cost: J ( w ) = ( n ) ( n ) 2 the center are fixed 5 mu = np.linspace(0,4,10) #4 Gaussians bases ( f ( x ; w ) − ) y 2 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 7 w = np.linalg.lstsq(Phi, y)[0] the model is linear in its parameters 8 yh = np.dot(Phi,w) 9 plt.plot(x, yh, 'g-') the cost is convex in w (unique minimum) even has a closed form solution adaptive case we can make the bases adaptive by learning these centers model: f ( x ; w , μ ) = ∑ d ( x ; μ ) w ϕ d d d how to minimize the cost? 4 . 1

Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case 1 #x: N 2 #y: N model: f ( x ; w ) = ( x ) ∑ d w ϕ 3 plt.plot(x, y, 'b.') d d 4 phi = lambda x,mu: np.exp(-(x-mu)**2) 1 ∑ n cost: J ( w ) = ( n ) ( n ) 2 the center are fixed 5 mu = np.linspace(0,4,10) #4 Gaussians bases ( f ( x ; w ) − ) y 2 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 7 w = np.linalg.lstsq(Phi, y)[0] the model is linear in its parameters 8 yh = np.dot(Phi,w) 9 plt.plot(x, yh, 'g-') the cost is convex in w (unique minimum) even has a closed form solution adaptive case we can make the bases adaptive by learning these centers model: f ( x ; w , μ ) = ∑ d ( x ; μ ) w ϕ d d d how to minimize the cost? not convex in all model parameters use gradient descent to find a local minimum 4 . 1

Adaptive Adaptive Radial Bases Radial Bases d 2 ( x − μ ) e − ( x ) = ϕ s 2 d Gaussian bases, or radial bases non-adaptive case 1 #x: N 2 #y: N model: f ( x ; w ) = ( x ) ∑ d w ϕ 3 plt.plot(x, y, 'b.') d d 4 phi = lambda x,mu: np.exp(-(x-mu)**2) 1 ∑ n cost: J ( w ) = ( n ) ( n ) 2 the center are fixed 5 mu = np.linspace(0,4,10) #4 Gaussians bases ( f ( x ; w ) − ) y 2 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 7 w = np.linalg.lstsq(Phi, y)[0] the model is linear in its parameters 8 yh = np.dot(Phi,w) 9 plt.plot(x, yh, 'g-') the cost is convex in w (unique minimum) even has a closed form solution adaptive case we can make the bases adaptive by learning these centers model: f ( x ; w , μ ) = ∑ d ( x ; μ ) w ϕ d d d how to minimize the cost? not convex in all model parameters use gradient descent to find a local minimum note that the basis centers are adaptively changing 4 . 1

Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ d −( ) 1+ e s d using adaptive sigmoid bases gives us a neural network non-adaptive case μ d = 1 s d 4 . 2

Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ d −( ) 1+ e s d using adaptive sigmoid bases gives us a neural network non-adaptive case is fixed to D locations μ d = 1 s d 4 . 2

Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ d −( ) 1+ e s d using adaptive sigmoid bases gives us a neural network 1 #x: N ^ y 2 #y: N 3 plt.plot(x, y, 'b.') non-adaptive case 4 phi = lambda x,mu,sigma: 1/(1 + np.exp(-(x - mu))) 5 mu = np.linspace(0,3,10) is fixed to D locations μ w w 1 d D 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 w 2 ... = 1 7 w = np.linalg.lstsq(Phi, y)[0] s d 8 yh = np.dot(Phi,w) model: f ( x ; w ) = ( x ) ∑ d w ϕ 9 plt.plot(x, yh, 'g-') d d ( x ) ( x ) ( x ) ϕ ϕ ϕ 1 2 D 4 . 2

Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ d −( ) 1+ e s d using adaptive sigmoid bases gives us a neural network 1 #x: N ^ y 2 #y: N 3 plt.plot(x, y, 'b.') non-adaptive case 4 phi = lambda x,mu,sigma: 1/(1 + np.exp(-(x - mu))) 5 mu = np.linspace(0,3,10) is fixed to D locations μ w w 1 d D 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 w 2 ... = 1 7 w = np.linalg.lstsq(Phi, y)[0] s d 8 yh = np.dot(Phi,w) model: f ( x ; w ) = ( x ) ∑ d w ϕ 9 plt.plot(x, yh, 'g-') d d ( x ) ( x ) ( x ) ϕ ϕ ϕ 1 2 D D=10 4 . 2

Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ d −( ) 1+ e s d using adaptive sigmoid bases gives us a neural network 1 #x: N ^ y 2 #y: N 3 plt.plot(x, y, 'b.') non-adaptive case 4 phi = lambda x,mu,sigma: 1/(1 + np.exp(-(x - mu))) 5 mu = np.linspace(0,3,10) is fixed to D locations μ w w 1 d D 6 Phi = phi(x[:,None], mu[None,:]) #N x 10 w 2 ... = 1 7 w = np.linalg.lstsq(Phi, y)[0] s d 8 yh = np.dot(Phi,w) model: f ( x ; w ) = ( x ) ∑ d w ϕ 9 plt.plot(x, yh, 'g-') d d ( x ) ( x ) ( x ) ϕ ϕ ϕ 1 2 D D=10 D=5 D=3 4 . 2

Adaptive Adaptive Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ −( d ) 1+ e s d rewrite the sigmoid basis x − μ ( x ) = σ ( ) = σ ( v x + ) ϕ d b d d d s d 4 . 3

Adaptive Adaptive Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ −( d ) 1+ e s d rewrite the sigmoid basis x − μ ( x ) = σ ( ) = σ ( v x + ) ϕ d b d d d s d ⊤ ( x ) = σ ( v x + ) each basis is the logistic regression model ϕ b d d d assuming input is higher than one dimension 4 . 3

Adaptive Adaptive Sigmoid Bases Sigmoid Bases 1 ( x ) = ϕ d x − μ −( d ) 1+ e s d rewrite the sigmoid basis x − μ ( x ) = σ ( ) = σ ( v x + ) ϕ d b d d d s d ⊤ ( x ) = σ ( v x + ) each basis is the logistic regression model ϕ b d d d assuming input is higher than one dimension model: f ( x ; w , v , b ) = σ ( v x + ) ∑ d w b d d d 4 . 3

Applied Machine Learning Applied Machine Learning Multilayer - PowerPoint PPT Presentation

Applied Machine Learning Applied Machine Learning Multilayer Perceptron Siamak Ravanbakhsh Siamak Ravanbakhsh COMP 551 COMP 551 (winter 2020) (winter 2020) 1 Learning objectives Learning objectives multilayer percepron: model different

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Contact

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Slides and

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1 APPLIED

Applied Machine Learning Applied Machine Learning Convolutional Neural Networks Siamak

Applied Machine Learning Applied Machine Learning Convolutional Neural Networks Siamak

Applied Machine Learning Applied Machine Learning Perceptron and Support Vector Machines Siamak

Applied Machine Learning Applied Machine Learning Decision Trees Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Bootstrap, Bagging and Boosting Siamak

Applied Machine Learning Applied Machine Learning Logistic Regression Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Linear Regression Siamak Ravanbakhsh Siamak

Two classes of Blaschke products and their applications to operator theory Pamela Gorkin

Machine Learning 2: Nonlinear Regression Stefano Ermon April 13, 2016 Stefano Ermon April 13,

The probabilistic viewpoint and dynamics in arithmetic geometry Juan Rivera-Letelier Roots

Relativistic effects and non-collinear DFT What is relativistic effects? Dirac equation

Self-induced dust traps: overcoming planet formation barriers Jean-Franois Gonzalez Guillaume

Chapter 11 Rolling-Contact Bearings 11-1. bearing Types Function: Carry load in one or

DYNAMICS Ferdinand P. Beer Kinematics of Particles E. Russell Johnston, Jr. Lecture

Constraints on radial anisotropy in the central Pacific upper mantle from the NoMelt OBS array