Natural Language Understanding Lecture 2: Revision of neural - PowerPoint PPT Presentation

Natural Language Understanding Lecture 2: Revision of neural networks and backpropagation Adam Lopez Credits: Mirella Lapata and Frank Keller 19 January 2018 School of Informatics University of Edinburgh alopez@inf.ed.ac.uk 1

Biological neural networks • Neuron receives inputs and combines these in the cell body. • If the input reaches a threshold, then the neuron may fire (produce an output). • Some inputs are excitatory, while others are inhibitory. 2

The relationship of artifical neural networks to the brain 3

The relationship of artifical neural networks to the brain While the brain metaphor is sexy and intriguing, it is also distracting and cumbersome to manipulate mathematically. (Goldberg 2015) 3

The perceptron: an artificial neuron Developed by Frank Rosenblatt in 1957. x 1 w 1 w 2 x 2 y . . . � f . . . w n x n 4

The perceptron: an artificial neuron Developed by Frank Rosenblatt in 1957. x 1 w 1 w 2 x 2 y . . . � f . . . w n x n Input function: n u ( x ) = � w i x i i =1 4

The perceptron: an artificial neuron Developed by Frank Rosenblatt in 1957. x 1 w 1 w 2 x 2 y . . . � f . . . w n x n Activation function: threshold Input function:  n 1 , if u ( x ) > θ  u ( x ) = � w i x i y = f ( u ( x )) = 0 , otherwise i =1  4

The perceptron: an artificial neuron Developed by Frank Rosenblatt in 1957. x 1 w 1 w 2 x 2 y . . . � f . . . w n x n Activation function: threshold Input function: Activation state:  n 1 , if u ( x ) > θ  u ( x ) = � w i x i y = f ( u ( x )) = 0 or 1 (-1 or 1) 0 , otherwise i =1  4

The perceptron: an artificial neuron Developed by Frank Rosenblatt in 1957. x 1 w 1 w 2 x 2 y . . . � f . . . w n x n • Inputs are in the range [0, 1], where 0 is “off” and 1 is “on”. • Weights can be any real number (positive or negative). 5

Perceptrons can represent logic functions Perceptron for AND 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 1 0 0.5 1 0 0 f if � ≥ 1 then 1 else 0 1 1 1 6

Perceptrons can represent logic functions Perceptron for AND 0 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 1 0 0.5 1 0 0 1 f if � ≥ 1 then 1 else 0 1 1 1 6

Perceptrons can represent logic functions Perceptron for AND 0 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 1 0 0.5 1 0 0 1 f if � ≥ 1 then 1 else 0 1 1 1 0 · 0 . 5 + 1 · 0 . 5 = 0 . 5 < 1 6

Perceptrons can represent logic functions Perceptron for AND 0 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 0 1 0 0.5 1 0 0 1 f if � ≥ 1 then 1 else 0 1 1 1 0 · 0 . 5 + 1 · 0 . 5 = 0 . 5 < 1 6

Perceptrons can represent logic functions Perceptron for AND 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 1 0 0.5 1 0 0 f if � ≥ 1 then 1 else 0 1 1 1 7

Perceptrons can represent logic functions Perceptron for AND 1 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 1 0 0.5 1 0 0 1 f if � ≥ 1 then 1 else 0 1 1 1 7

Perceptrons can represent logic functions Perceptron for AND 1 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 0 1 0 0.5 1 0 0 1 f if � ≥ 1 then 1 else 0 1 1 1 1 · 0 . 5 + 1 · 0 . 5 = 1 = 1 7

Perceptrons can represent logic functions Perceptron for AND 1 0.5 x 1 AND x 2 x 1 x 2 0 0 0 1 1 0 1 0 0.5 1 0 0 1 f if � ≥ 1 then 1 else 0 1 1 1 1 · 0 . 5 + 1 · 0 . 5 = 1 = 1 7

Perceptrons can represent logic functions Perceptron for OR 0.5 x 1 OR x 2 x 1 x 2 0.5 0 0 0 0.5 0 1 1 1 0 1 f if � ≥ 0 . 5 then 1 else 0 1 1 1 8

Perceptrons can represent logic functions Perceptron for OR 0 0.5 x 1 OR x 2 x 1 x 2 0.5 0 0 0 0.5 0 1 1 1 0 1 1 f if � ≥ 0 . 5 then 1 else 0 1 1 1 8

Perceptrons can represent logic functions Perceptron for OR 0 0.5 x 1 OR x 2 x 1 x 2 0.5 0 0 0 0.5 0 1 1 1 0 1 1 f if � ≥ 0 . 5 then 1 else 0 1 1 1 0 · 0 . 5 + 1 · 0 . 5 = 0 . 5 = 0 . 5 8

Perceptrons can represent logic functions Perceptron for OR 0 0.5 x 1 OR x 2 x 1 x 2 0.5 0 0 0 1 0.5 0 1 1 1 0 1 1 f if � ≥ 0 . 5 then 1 else 0 1 1 1 0 · 0 . 5 + 1 · 0 . 5 = 0 . 5 = 0 . 5 8

How would you represent NOT(OR)? Perceptron for NOT(OR) ??? x 1 OR x 2 x 1 x 2 0 0 1 0.5 0 1 0 ??? 1 0 0 f if � ≥ ??? then 1 else 0 1 1 0 9

Perceptrons are linear classifiers − 1 w 0 1 x 1 w 1 y x = � n i =0 w i x i 0 0 w n x n x n 10

� � Perceptrons are linear classifiers Perceptrons are linear classifiers, i.e., they can only separate points with a hyperplane (a straight line). � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 11

Perceptron can learn logic functions from examples Give some examples to the Perceptron: N input x target t 1 (0,1,0,0) 1 2 (1,0,0,0) 0 3 (0,1,1,1) 0 4 (1,0,1,0) 0 5 (1,1,1,1) 1 6 (0,1,0,0) 1 . . . . . . . . . • Input: a vector of 1’s and 0’s—-a feature vector. • Output: a 1 or 0, given as the target. 12

Perceptron can learn logic functions from examples Give some examples to the Perceptron: N input x target t output o 1 (0,1,0,0) 1 0 2 (1,0,0,0) 0 0 3 (0,1,1,1) 0 1 4 (1,0,1,0) 0 1 5 (1,1,1,1) 1 0 6 (0,1,0,0) 1 1 . . . . . . . . . . . . • Input: a vector of 1’s and 0’s—-a feature vector. • Output: a 1 or 0, given as the target. 12

Perceptron can learn logic functions from examples Give some examples to the Perceptron: N input x target t output o 1 (0,1,0,0) 1 0 2 (1,0,0,0) 0 0 3 (0,1,1,1) 0 1 4 (1,0,1,0) 0 1 5 (1,1,1,1) 1 0 6 (0,1,0,0) 1 1 . . . . . . . . . • Input: a vector of 1’s and 0’s—-a feature vector. • Output: a 1 or 0, given as the target. • How do we efficiently find the weights and threshold? 12

Learning Q 1 : Choosing weights and threshold θ for the perceptron is not easy! What’s an effective to learn the weights and threshold from examples? A 1 : We use a learning algorithm that adjusts the weights and threshold based on examples. http://www.youtube.com/watch?v=vGwemZhPlsA&feature=youtu.be 13

Simplify by converting θ into a weight n � w i x i > θ i =1 14

Simplify by converting θ into a weight n � w i x i > θ i =1 n � w i x i − θ > 0 i =1 14

Simplify by converting θ into a weight n � w i x i > θ i =1 n � w i x i − θ > 0 i =1 w 1 x 1 + w 2 x 2 + . . . w n x n − θ > 0 14

Simplify by converting θ into a weight n � w i x i > θ i =1 n � w i x i − θ > 0 i =1 w 1 x 1 + w 2 x 2 + . . . w n x n − θ > 0 w 1 x 1 + w 2 x 2 + . . . w n x n + θ ( − 1) > 0 14

Simplify by converting θ into a weight n � w i x i > θ i =1 x 0 = − 1 n � w 0 = θ w i x i − θ > 0 x 1 w 1 i =1 w 2 x 2 y . . . � f . . . w n x n w 1 x 1 + w 2 x 2 + . . . w n x n − θ > 0 w 1 x 1 + w 2 x 2 + . . . w n x n + θ ( − 1) > 0 14

Simplify by converting θ into a weight x 0 = − 1 w 0 = θ x 1 w 1 w 2 x 2 y . . . � f . . . w n x n Let x 0 = − 1 be the weight of θ . Now our activation function is:  1 , if u ( x ) > 0  y = f ( u ( x )) = 0 , otherwise 15 

Learn by adjusting weights whenever output � = target Intuition: classification depends on the sign (+ or -) of the output. If output has a different sign than the target, adjust weights to move output in the direction of 0. 16

Learn by adjusting weights whenever output � = target Intuition: classification depends on the sign (+ or -) of the output. If output has a different sign than the target, adjust weights to move output in the direction of 0. o = 0 and t = 0 Don’t adjust weights 16

Learn by adjusting weights whenever output � = target Intuition: classification depends on the sign (+ or -) of the output. If output has a different sign than the target, adjust weights to move output in the direction of 0. o = 0 and t = 0 Don’t adjust weights o = 0 and t = 1 u ( x ) was too low. Make it bigger! 16

Learn by adjusting weights whenever output � = target Intuition: classification depends on the sign (+ or -) of the output. If output has a different sign than the target, adjust weights to move output in the direction of 0. o = 0 and t = 0 Don’t adjust weights o = 0 and t = 1 u ( x ) was too low. Make it bigger! o = 1 and t = 0 u ( x ) was too high. Make it smaller! 16

Learn by adjusting weights whenever output � = target Intuition: classification depends on the sign (+ or -) of the output. If output has a different sign than the target, adjust weights to move output in the direction of 0. o = 0 and t = 0 Don’t adjust weights o = 0 and t = 1 u ( x ) was too low. Make it bigger! o = 1 and t = 0 u ( x ) was too high. Make it smaller! o = 1 and t = 1 Don’t adjust weights 16

Natural Language Understanding Lecture 2: Revision of neural - PowerPoint PPT Presentation

Natural Language Understanding Lecture 2: Revision of neural networks and backpropagation Adam Lopez Credits: Mirella Lapata and Frank Keller 19 January 2018 School of Informatics University of Edinburgh alopez@inf.ed.ac.uk 1 Biological

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing Stages in understanding natural language Why its hard

Neural Language Models The New Frontier of Natural Language Understanding Gabriele Sarti

Outline of todays lecture Overview of Natural Language Generation Components of Natural

A Software Suite for the Understanding of Natural Language Marco Ponza Paolo Ferragina Natural

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING

MultiLayer Neural Networks Xiaogang Wang xgwang@ee.cuhk.edu.hk January 15, 2019 cuhk Xiaogang

Cost function Machine Learning Neural Network (Classification) total no. of layers in network

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 7 Slides adapted from

A brief history of deep learning 1 Andrew Kurenkov. This summary is based on A Brief

Navigating and Editing Prototxts Alexander Radovic College of William and Mary Alexander

Supervised Learning Supervised learning algorithms require the presence of a teacher who

1 Consistency of a Single Arc Arc Consistency of an Entire CSP A simple form of propagation

Solving Sudoku Puzzles Constraint propagation, Graph traversal, and Backtracking Constraint