Natural Language Understanding Lecture 2: Revision of neural networks and backpropagation Adam Lopez Credits: Mirella Lapata and Frank Keller 19 January 2018 School of Informatics University of Edinburgh alopez@inf.ed.ac.uk 1
Biological neural networks • Neuron receives inputs and combines these in the cell body. • If the input reaches a threshold, then the neuron may fire (produce an output). • Some inputs are excitatory, while others are inhibitory. 2
The relationship of artifical neural networks to the brain 3
The relationship of artifical neural networks to the brain While the brain metaphor is sexy and intriguing, it is also distracting and cumbersome to manipulate mathematically. (Goldberg 2015) 3
The perceptron: an artificial neuron Developed by Frank Rosenblatt in 1957. x 1 w 1 w 2 x 2 y . . . � f . . . w n x n 4
The perceptron: an artificial neuron Developed by Frank Rosenblatt in 1957. x 1 w 1 w 2 x 2 y . . . � f . . . w n x n Input function: n u ( x ) = � w i x i i =1 4
The perceptron: an artificial neuron Developed by Frank Rosenblatt in 1957. x 1 w 1 w 2 x 2 y . . . � f . . . w n x n Activation function: threshold Input function: n 1 , if u ( x ) > θ u ( x ) = � w i x i y = f ( u ( x )) = 0 , otherwise i =1 4
The perceptron: an artificial neuron Developed by Frank Rosenblatt in 1957. x 1 w 1 w 2 x 2 y . . . � f . . . w n x n Activation function: threshold Input function: Activation state: n 1 , if u ( x ) > θ u ( x ) = � w i x i y = f ( u ( x )) = 0 or 1 (-1 or 1) 0 , otherwise i =1 4
The perceptron: an artificial neuron Developed by Frank Rosenblatt in 1957. x 1 w 1 w 2 x 2 y . . . � f . . . w n x n • Inputs are in the range [0, 1], where 0 is “off” and 1 is “on”. • Weights can be any real number (positive or negative). 5
Perceptrons can represent logic functions Perceptron for AND 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 1 0 0.5 1 0 0 f if � ≥ 1 then 1 else 0 1 1 1 6
Perceptrons can represent logic functions Perceptron for AND 0 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 1 0 0.5 1 0 0 1 f if � ≥ 1 then 1 else 0 1 1 1 6
Perceptrons can represent logic functions Perceptron for AND 0 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 1 0 0.5 1 0 0 1 f if � ≥ 1 then 1 else 0 1 1 1 0 · 0 . 5 + 1 · 0 . 5 = 0 . 5 < 1 6
Perceptrons can represent logic functions Perceptron for AND 0 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 0 1 0 0.5 1 0 0 1 f if � ≥ 1 then 1 else 0 1 1 1 0 · 0 . 5 + 1 · 0 . 5 = 0 . 5 < 1 6
Perceptrons can represent logic functions Perceptron for AND 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 1 0 0.5 1 0 0 f if � ≥ 1 then 1 else 0 1 1 1 7
Perceptrons can represent logic functions Perceptron for AND 1 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 1 0 0.5 1 0 0 1 f if � ≥ 1 then 1 else 0 1 1 1 7
Perceptrons can represent logic functions Perceptron for AND 1 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 1 0 0.5 1 0 0 1 f if � ≥ 1 then 1 else 0 1 1 1 1 · 0 . 5 + 1 · 0 . 5 = 1 = 1 7
Perceptrons can represent logic functions Perceptron for AND 1 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 1 0 1 0 0.5 1 0 0 1 f if � ≥ 1 then 1 else 0 1 1 1 1 · 0 . 5 + 1 · 0 . 5 = 1 = 1 7
Perceptrons can represent logic functions Perceptron for OR 0.5 x 1 OR x 2 x 1 x 2 0.5 0 0 0 0.5 0 1 1 1 0 1 f if � ≥ 0 . 5 then 1 else 0 1 1 1 8
Perceptrons can represent logic functions Perceptron for OR 0 0.5 x 1 OR x 2 x 1 x 2 0.5 0 0 0 0.5 0 1 1 1 0 1 1 f if � ≥ 0 . 5 then 1 else 0 1 1 1 8
Perceptrons can represent logic functions Perceptron for OR 0 0.5 x 1 OR x 2 x 1 x 2 0.5 0 0 0 0.5 0 1 1 1 0 1 1 f if � ≥ 0 . 5 then 1 else 0 1 1 1 0 · 0 . 5 + 1 · 0 . 5 = 0 . 5 = 0 . 5 8
Perceptrons can represent logic functions Perceptron for OR 0 0.5 x 1 OR x 2 x 1 x 2 0.5 0 0 0 1 0.5 0 1 1 1 0 1 1 f if � ≥ 0 . 5 then 1 else 0 1 1 1 0 · 0 . 5 + 1 · 0 . 5 = 0 . 5 = 0 . 5 8
How would you represent NOT(OR)? Perceptron for NOT(OR) ??? x 1 OR x 2 x 1 x 2 0 0 1 0.5 0 1 0 ??? 1 0 0 f if � ≥ ??? then 1 else 0 1 1 0 9
Perceptrons are linear classifiers − 1 w 0 1 x 1 w 1 y x = � n i =0 w i x i 0 0 w n x n x n 10
� � Perceptrons are linear classifiers Perceptrons are linear classifiers, i.e., they can only separate points with a hyperplane (a straight line). � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 11
Perceptron can learn logic functions from examples Give some examples to the Perceptron: N input x target t 1 (0,1,0,0) 1 2 (1,0,0,0) 0 3 (0,1,1,1) 0 4 (1,0,1,0) 0 5 (1,1,1,1) 1 6 (0,1,0,0) 1 . . . . . . . . . • Input: a vector of 1’s and 0’s—-a feature vector. • Output: a 1 or 0, given as the target. 12
Perceptron can learn logic functions from examples Give some examples to the Perceptron: N input x target t output o 1 (0,1,0,0) 1 0 2 (1,0,0,0) 0 0 3 (0,1,1,1) 0 1 4 (1,0,1,0) 0 1 5 (1,1,1,1) 1 0 6 (0,1,0,0) 1 1 . . . . . . . . . . . . • Input: a vector of 1’s and 0’s—-a feature vector. • Output: a 1 or 0, given as the target. 12
Perceptron can learn logic functions from examples Give some examples to the Perceptron: N input x target t output o 1 (0,1,0,0) 1 0 2 (1,0,0,0) 0 0 3 (0,1,1,1) 0 1 4 (1,0,1,0) 0 1 5 (1,1,1,1) 1 0 6 (0,1,0,0) 1 1 . . . . . . . . . • Input: a vector of 1’s and 0’s—-a feature vector. • Output: a 1 or 0, given as the target. • How do we efficiently find the weights and threshold? 12
Learning Q 1 : Choosing weights and threshold θ for the perceptron is not easy! What’s an effective to learn the weights and threshold from examples? A 1 : We use a learning algorithm that adjusts the weights and threshold based on examples. http://www.youtube.com/watch?v=vGwemZhPlsA&feature=youtu.be 13
Simplify by converting θ into a weight n � w i x i > θ i =1 14
Simplify by converting θ into a weight n � w i x i > θ i =1 n � w i x i − θ > 0 i =1 14
Simplify by converting θ into a weight n � w i x i > θ i =1 n � w i x i − θ > 0 i =1 w 1 x 1 + w 2 x 2 + . . . w n x n − θ > 0 14
Simplify by converting θ into a weight n � w i x i > θ i =1 n � w i x i − θ > 0 i =1 w 1 x 1 + w 2 x 2 + . . . w n x n − θ > 0 w 1 x 1 + w 2 x 2 + . . . w n x n + θ ( − 1) > 0 14
Simplify by converting θ into a weight n � w i x i > θ i =1 x 0 = − 1 n � w 0 = θ w i x i − θ > 0 x 1 w 1 i =1 w 2 x 2 y . . . � f . . . w n x n w 1 x 1 + w 2 x 2 + . . . w n x n − θ > 0 w 1 x 1 + w 2 x 2 + . . . w n x n + θ ( − 1) > 0 14
Simplify by converting θ into a weight x 0 = − 1 w 0 = θ x 1 w 1 w 2 x 2 y . . . � f . . . w n x n Let x 0 = − 1 be the weight of θ . Now our activation function is: 1 , if u ( x ) > 0 y = f ( u ( x )) = 0 , otherwise 15
Learn by adjusting weights whenever output � = target Intuition: classification depends on the sign (+ or -) of the output. If output has a different sign than the target, adjust weights to move output in the direction of 0. 16
Learn by adjusting weights whenever output � = target Intuition: classification depends on the sign (+ or -) of the output. If output has a different sign than the target, adjust weights to move output in the direction of 0. o = 0 and t = 0 Don’t adjust weights 16
Learn by adjusting weights whenever output � = target Intuition: classification depends on the sign (+ or -) of the output. If output has a different sign than the target, adjust weights to move output in the direction of 0. o = 0 and t = 0 Don’t adjust weights o = 0 and t = 1 u ( x ) was too low. Make it bigger! 16
Learn by adjusting weights whenever output � = target Intuition: classification depends on the sign (+ or -) of the output. If output has a different sign than the target, adjust weights to move output in the direction of 0. o = 0 and t = 0 Don’t adjust weights o = 0 and t = 1 u ( x ) was too low. Make it bigger! o = 1 and t = 0 u ( x ) was too high. Make it smaller! 16
Learn by adjusting weights whenever output � = target Intuition: classification depends on the sign (+ or -) of the output. If output has a different sign than the target, adjust weights to move output in the direction of 0. o = 0 and t = 0 Don’t adjust weights o = 0 and t = 1 u ( x ) was too low. Make it bigger! o = 1 and t = 0 u ( x ) was too high. Make it smaller! o = 1 and t = 1 Don’t adjust weights 16
Recommend
More recommend