NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig - PowerPoint PPT Presentation

NLP Programming Tutorial 3 – The Perceptron Algorithm NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science and Technology (NAIST) 1

NLP Programming Tutorial 3 – The Perceptron Algorithm Prediction Problems Given x, predict y 2

NLP Programming Tutorial 3 – The Perceptron Algorithm Prediction Problems Given x, predict y A book review Is it positive? Binary Oh, man I love this book! Prediction yes (2 choices) This book is so boring... no A tweet Its language Multi-class On the way to the park! English Prediction 公園に行くなう！ (several choices) Japanese A sentence Its syntactic parse S Structured VP Prediction I read a book NP (millions of choices) N VBD DET NN 3 I read a book

NLP Programming Tutorial 3 – The Perceptron Algorithm Example we will use: ● Given an introductory sentence from Wikipedia ● Predict whether the article is about a person Given Predict Gonso was a Sanron sect priest (754-827) Yes! in the late Nara and early Heian periods. Shichikuzan Chigogataki Fudomyoo is No! a historical site located at Magura, Maizuru City, Kyoto Prefecture. ● This is binary classification (of course!) 4

NLP Programming Tutorial 3 – The Perceptron Algorithm Performing Prediction 5

NLP Programming Tutorial 3 – The Perceptron Algorithm How do We Predict? Gonso was a Sanron sect priest ( 754 – 827 ) in the late Nara and early Heian periods . Shichikuzan Chigogataki Fudomyoo is a historical site located at Magura , Maizuru City , Kyoto Prefecture . 6

NLP Programming Tutorial 3 – The Perceptron Algorithm How do We Predict? Contains “(<#>-<#>)” → Contains “priest” → probably person! probably person! Gonso was a Sanron sect priest ( 754 – 827 ) in the late Nara and early Heian periods . Shichikuzan Chigogataki Fudomyoo is Contains “site” → a historical site located at Magura , Maizuru probably not person! City , Kyoto Prefecture . Contains “Kyoto Prefecture” → probably not person! 7

NLP Programming Tutorial 3 – The Perceptron Algorithm Combining Pieces of Information ● Each element that helps us predict is a feature contains “priest” contains “(<#>-<#>)” contains “site” contains “Kyoto Prefecture” ● Each feature has a weight, positive if it indicates “yes”, and negative if it indicates “no” w contains “priest” = 2 w contains “(<#>-<#>)” = 1 w contains “site” = -3 w contains “Kyoto Prefecture” = -1 ● For a new example, sum the weights Kuya (903-972) was a priest 2 + -1 + 1 = 2 born in Kyoto Prefecture. ● If the sum is at least 0: “yes”, otherwise: “no” 8

NLP Programming Tutorial 3 – The Perceptron Algorithm Let me Say that in Math! sign ( w ⋅ϕ( x )) y = I sign ( ∑ i = 1 w i ⋅ϕ i ( x )) = ● x: the input ● φ(x) : vector of feature functions {φ 1 (x), φ 2 (x), …, φ I (x)} ● w : the weight vector {w 1 , w 2 , …, w I } ● y: the prediction, +1 if “yes”, -1 if “no” ● (sign(v) is +1 if v >= 0, -1 otherwise) 9

NLP Programming Tutorial 3 – The Perceptron Algorithm Example Feature Functions: Unigram Features ● Equal to “number of times a particular word appears” x = A site , located in Maizuru , Kyoto φ unigram “A” (x) = 1 φ unigram “site” (x) = 1 φ unigram “,” (x) = 2 φ unigram “located” (x) = 1 φ unigram “in” (x) = 1 φ unigram “Maizuru” (x) = 1 φ unigram “Kyoto” (x) = 1 φ unigram “the” (x) = 0 φ unigram “temple” (x) = 0 The rest are all 0 … ● For convenience, we use feature names (φ unigram “A” ) instead of feature indexes (φ 1 ) 10

NLP Programming Tutorial 3 – The Perceptron Algorithm Calculating the Weighted Sum x = A site , located in Maizuru , Kyoto w unigram “a” = 0 0 φ unigram “A” (x) = 1 + w unigram “site” = -3 -3 φ unigram “site” (x) = 1 + φ unigram “located” (x) = 1 w unigram “located” = 0 0 + w unigram “Maizuru” = 0 0 φ unigram “Maizuru” (x) = 1 + = * 0 φ unigram “,” (x) = 2 w unigram “,” = 0 + 0 φ unigram “in” (x) = 1 w unigram “in” = 0 + w unigram “Kyoto” = 0 0 φ unigram “Kyoto” (x) = 1 + w unigram “priest” = 2 0 φ unigram “priest” (x) = 0 + w unigram “black” = 0 0 φ unigram “black” (x) = 0 + … … = 11 -3 → No!

NLP Programming Tutorial 3 – The Perceptron Algorithm Pseudo Code for Prediction predict_all ( model_file, input_file ): load w from model_file # so w[name] = w name for each x in input_file phi = create_features ( x ) # so phi[name] = φ name (x) y' = predict_one ( w, phi ) # calculate sign(w*φ(x)) print y' 12

NLP Programming Tutorial 3 – The Perceptron Algorithm Pseudo Code for Predicting a Single Example predict_one ( w, phi ) score = 0 for each name , value in phi # score = w*φ(x) if name exists in w score += value * w [ name ] if score >= 0 return 1 else return -1 13

NLP Programming Tutorial 3 – The Perceptron Algorithm Pseudo Code for Feature Creation (Example: Unigram Features) CREATE_FEATURES ( x ): create map phi split x into words for word in words phi [“UNI: ”+word ] += 1 # We add “UNI:” to indicate unigrams return phi ● You can modify this function to use other features! ● Bigrams? ● Other features? 14

NLP Programming Tutorial 3 – The Perceptron Algorithm Learning Weights Using the Perceptron Algorithm 15

NLP Programming Tutorial 3 – The Perceptron Algorithm Learning Weights ● Manually creating weights is hard ● Many many possible useful features ● Changing weights changes results in unexpected ways ● Instead, we can learn from labeled data y x 1 FUJIWARA no Chikamori ( year of birth and death unknown ) was a samurai and poet who lived at the end of the Heian period . 1 Ryonen ( 1646 - October 29 , 1711 ) was a Buddhist nun of the Obaku Sect who lived from the early Edo period to the mid-Edo period . -1 A moat settlement is a village surrounded by a moat . -1 Fushimi Momoyama Athletic Park is located in Momoyama-cho , Kyoto City , Kyoto Prefecture . 16

NLP Programming Tutorial 3 – The Perceptron Algorithm Online Learning create map w for I iterations for each labeled pair x, y in the data phi = create_features (x) y' = predict_one (w, phi) if y' != y update_weights (w, phi, y) ● In other words ● Try to classify each training example ● Every time we make a mistake, update the weights ● Many different online learning algorithms ● The most simple is the perceptron 17

NLP Programming Tutorial 3 – The Perceptron Algorithm Perceptron Weight Update w ← w + y ϕ( x ) ● In other words: ● If y=1, increase the weights for features in φ (x) – Features for positive examples get a higher weight ● If y=-1, decrease the weights for features in φ (x) – Features for negative examples get a lower weight → Every time we update, our predictions get better! update_weights ( w, phi, y ) for name, value in phi : w [ name ] += value * y 18

NLP Programming Tutorial 3 – The Perceptron Algorithm Example: Initial Update ● Initialize w = 0 y = -1 x = A site , located in Maizuru , Kyoto y ' = sign ( w ⋅ϕ( x ))= 1 w ⋅ϕ( x )= 0 y ' ≠ y w ← w + y ϕ( x ) w unigram “Maizuru” = -1 w unigram “A” = -1 w unigram “,” = -2 w unigram “site” = -1 w unigram “in” = -1 w unigram “located” = -1 19 w unigram “Kyoto” = -1

NLP Programming Tutorial 3 – The Perceptron Algorithm Example: Second Update y = 1 x = Shoken , monk born in Kyoto -2 -1 -1 y ' = sign ( w ⋅ϕ( x ))=− 1 w ⋅ϕ( x )=− 4 y ' ≠ y w ← w + y ϕ( x ) w unigram “Maizuru” = -1 w unigram “A” = -1 w unigram “Shoken” = 1 w unigram “,” = -1 w unigram “site” = -1 w unigram “monk” = 1 w unigram “in” = 0 w unigram “located” = -1 w unigram “born” = 1 20 w unigram “Kyoto” = 0

NLP Programming Tutorial 3 – The Perceptron Algorithm Exercise 21

NLP Programming Tutorial 3 – The Perceptron Algorithm Exercise (1) ● Write two programs ● train-perceptron: Creates a perceptron model ● test-perceptron: Reads a perceptron model and outputs one prediction per line ● Test train-perceptron ● Input: test/03-train-input.txt ● Answer: test/03-train-answer.txt 22

NLP Programming Tutorial 3 – The Perceptron Algorithm Exercise (2) ● Train a model on data-en/titles-en-train.labeled ● Predict the labels of data-en/titles-en-test.word ● Grade your answers and report next week script/grade-prediction.py data-en/titles-en-test.labeled your_answer ● ● Extra challenge: ● Find places where the model makes a mistake and analyze why ● Devise new features that could increase accuracy 23

NLP Programming Tutorial 3 – The Perceptron Algorithm Thank You! 24

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig - PowerPoint PPT Presentation

NLP Programming Tutorial 3 The Perceptron Algorithm NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science and Technology (NAIST) 1 NLP Programming Tutorial 3 The Perceptron Algorithm Prediction

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

NLP Programming Tutorial 0 - Programming Basics Graham Neubig Nara Institute of Science and

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

NLP Programming Tutorial 2 - Bigram Language Models Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 1 - Unigram Language Models Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science

NLP Programming Tutorial 12 - Dependency Parsing Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 6 - Advanced Discriminative Learning Graham Neubig Nara Institute of

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 6 - Kana-Kanji Conversion Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 13 - Beam and A* Search Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 7 - Topic Models Graham Neubig Nara Institute of Science and Technology

Constrained Pseudo-market Equilibrium Federico Echenique Antonio Miralles Jun Zhang Caltech

Defjning the System External Potential PseudoPotentials NCPP/USPP/PAW Structure of a

Running Time Why do we need to analyze the running Algorithm/Running Time Analysis time of a

Pseudo-measurement simulations and bootstrap for the experimental cross-section covariances

Simple and Efficient Pseudorandom generators from Gaussian Processes Eshan Chattopadhyay Anindya

Pseudo-supersymmetry: a tale of alternate realities Jan Rosseel (ITF, K. U. Leuven) Work in

Progress on Parton Pseudo-Distributions II Joe Karpie William & Mary / Jefferson Lab In

Imaginaries in pseudo- p -adically closed fields Joint with Samaria Montenegro Silvain Rideau UC

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig - PowerPoint PPT Presentation

NLP Programming Tutorial 3 The Perceptron Algorithm NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science and Technology (NAIST) 1 NLP Programming Tutorial 3 The Perceptron Algorithm Prediction

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

NLP Programming Tutorial 0 - Programming Basics Graham Neubig Nara Institute of Science and

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

NLP Programming Tutorial 2 - Bigram Language Models Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 1 - Unigram Language Models Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science

NLP Programming Tutorial 12 - Dependency Parsing Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 6 - Advanced Discriminative Learning Graham Neubig Nara Institute of

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 6 - Kana-Kanji Conversion Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 13 - Beam and A* Search Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 7 - Topic Models Graham Neubig Nara Institute of Science and Technology

Constrained Pseudo-market Equilibrium Federico Echenique Antonio Miralles Jun Zhang Caltech

Defjning the System External Potential PseudoPotentials NCPP/USPP/PAW Structure of a

Running Time Why do we need to analyze the running Algorithm/Running Time Analysis time of a

Pseudo-measurement simulations and bootstrap for the experimental cross-section covariances

Simple and Efficient Pseudorandom generators from Gaussian Processes Eshan Chattopadhyay Anindya

Pseudo-supersymmetry: a tale of alternate realities Jan Rosseel (ITF, K. U. Leuven) Work in

Progress on Parton Pseudo-Distributions II Joe Karpie William &amp; Mary / Jefferson Lab In

Imaginaries in pseudo- p -adically closed fields Joint with Samaria Montenegro Silvain Rideau UC

Progress on Parton Pseudo-Distributions II Joe Karpie William & Mary / Jefferson Lab In