Learning From Data Lecture 20 Multilayer Perceptron Multiple layers Universal Approximation The Neural Network M. Magdon-Ismail CSCI 4100/6100
recap: Unsupervised Learning k -Means Clustering Gaussian Mixture Model P ( x ) x ‘Hard’ partition into k -clusters ‘Soft’ probability density estimation M Multilayer Perceptron : 2 /18 � A c L Creator: Malik Magdon-Ismail Bio-inspired Neural Network − →
The Neural Network - Biologically Inspired Engineering success may start with biological inspiration, but then take a totally different path. M Multilayer Perceptron : 3 /18 � A c L Creator: Malik Magdon-Ismail Planes don’t flap wings − →
Planes Don’t Flap Wings to Fly Engineering success may start with biological inspiration, but then take a totally different path. M Multilayer Perceptron : 4 /18 � A c L Creator: Malik Magdon-Ismail xor − →
xor : A Limitation of the Linear Model +1 − 1 x 2 − 1 +1 x 1 M Multilayer Perceptron : 5 /18 � A c L Creator: Malik Magdon-Ismail Decomposing xor − →
Decomposing xor +1 − 1 f = h 1 h 2 + h 1 h 2 x 2 − 1 +1 x 1 +1 − 1 − 1 x 2 x 2 +1 x 1 x 1 h 1 ( x ) = sign( w t 1 x ) h 2 ( x ) = sign( w t 2 x ) M Multilayer Perceptron : 6 /18 � A c L Creator: Malik Magdon-Ismail Perceptrons for or and and − →
Perceptrons for or and and or ( x 1 , x 2 ) = sign( x 1 + x 2 + 1 . 5) and ( x 1 , x 2 ) = sign( x 1 + x 2 − 1 . 5) 1 1 1 . 5 − 1 . 5 x 1 x 1 or ( x 1 , x 2 ) and ( x 1 , x 2 ) 1 1 1 1 x 2 x 2 M Multilayer Perceptron : 7 /18 � A c L Creator: Malik Magdon-Ismail Representing f using or and and − →
Representing f Using or and and f = h 1 h 2 + h 1 h 2 1 1 . 5 h 1 h 2 f 1 1 h 1 h 2 M Multilayer Perceptron : 8 /18 � A c L Creator: Malik Magdon-Ismail Expand and s − →
Representing f Using or and and f = h 1 h 2 + h 1 h 2 1 1 − 1 . 5 − 1 . 5 1 . 5 1 h 1 f 1 − 1 − 1 1 h 2 1 M Multilayer Perceptron : 9 /18 � A c L Creator: Malik Magdon-Ismail Expand h 1 , h 2 − →
Representing f Using or and and f = h 1 h 2 + h 1 h 2 1 1 1 − 1 . 5 w t 1 x − 1 . 5 1 . 5 1 x 1 f 1 − 1 − 1 1 x 2 1 w t 2 x M Multilayer Perceptron : 10 /18 � A c L Creator: Malik Magdon-Ismail The Multilayer Perceptron − →
The Multilayer Perceptron (MLP) 1 1 1 − 1 . 5 w t 1 x − 1 . 5 1 . 5 1 x 1 f 1 − 1 − 1 1 x 2 1 w t 2 x 1 w 0 w 1 x 1 sign( w t x ) w 2 x 2 More layers allow us to implement f These additional layers are called hidden layers M Multilayer Perceptron : 11 /18 � A c L Creator: Malik Magdon-Ismail Universal approximation − →
Universal Approximation Any target function f that can be decomposed into linear separators can be implemented by a 3-layer MLP. M Multilayer Perceptron : 12 /18 � A c L Creator: Malik Magdon-Ismail Circle Example − →
Universal Approximation A sufficiently smooth separator can “essentially” be decomposed into linear separators. − − − − − − + + + + + + + + + + + + − − − − − − Target 8 perceptrons 16 perceptrons M Multilayer Perceptron : 13 /18 � A c L Creator: Malik Magdon-Ismail Approximation versus generalization − →
Approximation Versus Generalization The size of the MLP controls the approximation-generalization tradeoff. More nodes per hidden layer = ⇒ approximation ↑ and generalization ↓ M Multilayer Perceptron : 14 /18 � A c L Creator: Malik Magdon-Ismail Minimizing E in − →
Minimizing E in A combinatorial problem even harder with the MLP than the Perceptron. E in is not smooth (due to sign function), so cannot use gradient descent. sign( x ) ≈ tan( x ) − → gradient descent to minimize E in . M Multilayer Perceptron : 15 /18 � A c L Creator: Malik Magdon-Ismail Neural Network − →
The Neural Network 1 1 1 h ( x ) x 1 θ θ θ x 2 θ θ θ ( s ) . . . s θ x d input layer ℓ = 0 hidden layers 0 < ℓ < L output layer ℓ = L M Multilayer Perceptron : 16 /18 � A c L Creator: Malik Magdon-Ismail Zooming into a hidden node − →
Zooming into a Hidden Node 1 1 1 x 1 h ( x ) θ θ θ x 2 θ θ θ ( s ) . . . s θ x d input layer ℓ = 0 hidden layers 0 < ℓ < L output layer ℓ = L layer ( ℓ + 1) θ W ( ℓ +1) s ( ℓ ) x ( ℓ ) W ( ℓ ) + θ layer ( ℓ − 1) layer ℓ layer ℓ parameters layers ℓ = 0 , 1 , 2 , . . . , L layer ℓ has “dimension” d ( ℓ ) = ⇒ d ( ℓ ) + 1 nodes d ( ℓ ) dimensional input vector s ( ℓ ) signals in w ( ℓ ) w ( ℓ ) w ( ℓ ) · · · 1 2 d ( ℓ ) d ( ℓ ) + 1 dimensional output vector x ( ℓ ) outputs W ( ℓ ) = . . . ( d ( ℓ − 1) + 1) × d ( ℓ ) dimensional matrix W ( ℓ ) weights in W ( ℓ +1) ( d ( ℓ ) + 1) × d ( ℓ +1) dimensional matrix weights out M Multilayer Perceptron : 17 /18 � A c L Creator: Malik Magdon-Ismail Neural Network − →
The Neural Network Biology − − − − − − − − − − − → Engineering − − − → 1 1 1 x 1 h ( x ) θ θ θ x 2 θ θ θ ( s ) . . . s θ x d input layer ℓ = 0 hidden layers 0 < ℓ < L output layer ℓ = L M Multilayer Perceptron : 18 /18 � A c L Creator: Malik Magdon-Ismail
Recommend
More recommend