text classification and sequence labeling
play

Text Classification and Sequence Labeling Graham Neubig Text - PowerPoint PPT Presentation

CMU CS11-737: Multilingual NLP Text Classification and Sequence Labeling Graham Neubig Text Classification Given an input text X , predict an output label y Topic Classification food food politics politics I like


  1. CMU CS11-737: Multilingual NLP Text Classification and Sequence Labeling Graham Neubig

  2. 桃と梨が好き Text Classification • Given an input text X , predict an output label y Topic Classification food food politics politics I like peaches and pears I like peaches and herb music music ... ... Language Identification English English Japanese Japanese I like peaches and pears German German ... ... Sentiment Analysis (sentence/document-level) positive positive neutral neutral I like peaches and pears I hate peaches and pears negative negative ... and many many more!

  3. Sequence Labeling • Given an input text X , predict an output label sequence Y of equal length! Part of Speech Tagging Lemmatization He saw two birds He saw two birds PRON VERB NUM NOUN see two bird he Morphological Tagging He saw two birds PronType=prs Tense=past, NumType=card Number=plur VerbForm=fin ... and more!

  4. Span Labeling • Given an input text X , predict an output spans and labels Y. Named Entity Recognition Graham Neubig is teaching at Carnegie Mellon University PER ORG Syntactic Chunking Graham Neubig is teaching at Carnegie Mellon University NP VP NP Semantic Role Labeling Graham Neubig is teaching at Carnegie Mellon University Actor Predicate Location ... and more!

  5. Span Labeling as Sequence Labeling • Predict B eginning, I n, and O ut tags for each word in a span Graham Neubig is teaching at Carnegie Mellon University PER ORG Graham Neubig is teaching at Carnegie Mellon University B-PER I-PER O O O B-ORG I-ORG I-ORG

  6. 外国⼈参政権 Text Segmentation • Given an input text X , split it into segmented text Y. Tokenization A well-conceived "thought exercise." A well - conceived " thought exercise . " Word Segmentation 外国 ⼈ 参政 権 外国 ⼈参 政権 foreign people voting rights foreign carrot government Morphological Segmentation Köpekler Köpek ler Köpekle r dog_paddle Tense=Aorist dog Number=Plural • Rule-based, or span labeling models

  7. Modeling for Sequence Labeling/Classification

  8. How do we Make Predictions? • Given an input text X • Extract features H • Predict labels Y Text Classification Sequence Labeling I like peaches I like peaches Feature Extractor Feature Extractor Predict Predict Predict Predict positive PRON VERB NOUN

  9. A Simple Extractor: Bag of Words (BOW) I like peaches lookup lookup lookup Label Probs + + = Predict

  10. <latexit sha1_base64="EzGUA2UljByz+/VvpPYifZTvI=">ACDHicbVC7TsMwFHXKq5RXgJHFoiAxVQlCAgakChbGIhFaqQmR4zqtW8eJbAepivIDLPwKCwMgVj6Ajb/BTNAy5GudHzOvfK9J0gYlcqyvo3KwuLS8kp1tba2vrG5ZW7v3Mk4FZg4OGax6ARIEkY5cRVjHQSQVAUMNIORlcTv/1AhKQxv1XjhHgR6nMaUoyUlnzIPEpvIBuKBDOyH0mfZrnmSvTyB/C4j3Mc9+sWw2rAJwndknqoETLN7/cXozTiHCFGZKya1uJ8jIkFMWM5DU3lSRBeIT6pKspRxGRXlZck8NDrfRgGAtdXMFC/T2RoUjKcRTozgipgZz1JuJ/XjdV4ZmXUZ6kinA8/ShMGVQxnEQDe1QrNhYE4QF1btCPEA6GKUDrOkQ7NmT54lz3Dhv2Dcn9eZlmUYV7IF9cARscAqa4Bq0gAMweATP4BW8GU/Gi/FufExbK0Y5swv+wPj8AXdWnAs=</latexit> <latexit sha1_base64="EzGUA2UljByz+/VvpPYifZTvI=">ACDHicbVC7TsMwFHXKq5RXgJHFoiAxVQlCAgakChbGIhFaqQmR4zqtW8eJbAepivIDLPwKCwMgVj6Ajb/BTNAy5GudHzOvfK9J0gYlcqyvo3KwuLS8kp1tba2vrG5ZW7v3Mk4FZg4OGax6ARIEkY5cRVjHQSQVAUMNIORlcTv/1AhKQxv1XjhHgR6nMaUoyUlnzIPEpvIBuKBDOyH0mfZrnmSvTyB/C4j3Mc9+sWw2rAJwndknqoETLN7/cXozTiHCFGZKya1uJ8jIkFMWM5DU3lSRBeIT6pKspRxGRXlZck8NDrfRgGAtdXMFC/T2RoUjKcRTozgipgZz1JuJ/XjdV4ZmXUZ6kinA8/ShMGVQxnEQDe1QrNhYE4QF1btCPEA6GKUDrOkQ7NmT54lz3Dhv2Dcn9eZlmUYV7IF9cARscAqa4Bq0gAMweATP4BW8GU/Gi/FufExbK0Y5swv+wPj8AXdWnAs=</latexit> <latexit sha1_base64="EzGUA2UljByz+/VvpPYifZTvI=">ACDHicbVC7TsMwFHXKq5RXgJHFoiAxVQlCAgakChbGIhFaqQmR4zqtW8eJbAepivIDLPwKCwMgVj6Ajb/BTNAy5GudHzOvfK9J0gYlcqyvo3KwuLS8kp1tba2vrG5ZW7v3Mk4FZg4OGax6ARIEkY5cRVjHQSQVAUMNIORlcTv/1AhKQxv1XjhHgR6nMaUoyUlnzIPEpvIBuKBDOyH0mfZrnmSvTyB/C4j3Mc9+sWw2rAJwndknqoETLN7/cXozTiHCFGZKya1uJ8jIkFMWM5DU3lSRBeIT6pKspRxGRXlZck8NDrfRgGAtdXMFC/T2RoUjKcRTozgipgZz1JuJ/XjdV4ZmXUZ6kinA8/ShMGVQxnEQDe1QrNhYE4QF1btCPEA6GKUDrOkQ7NmT54lz3Dhv2Dcn9eZlmUYV7IF9cARscAqa4Bq0gAMweATP4BW8GU/Gi/FufExbK0Y5swv+wPj8AXdWnAs=</latexit> A Simple Predictor: Linear Transform+Softmax p = softmax( W * h + b ) • Softmax converts arbitrary scores into probabilities -3.2 0.002 -2.9 e s i 0.003 1.0 p i = s= p= 0.329 P j e s j 2.2 0.444 0.6 0.090 … …

  11. Problem: Language is not a Bag of Words! I don’t love pears There’s nothing I don’t love about pears

  12. Better Featurizers • Bag of n-grams • Syntax-based features (e.g. subject-object pairs) • Neural networks • Recurrent neural networks • Convolutional networks • Self attention

  13. What is a Neural Net?: Computation Graphs

  14. “Neural” Nets Original Motivation: Neurons in the Brain Current Conception: Computation Graphs X f ( x 1 , x 2 , x 3 ) = x i i f ( M , v ) = Mv f ( U , V ) = UV f ( u , v ) = u · v f ( u ) = u > A b x c Image credit: Wikipedia

  15. expression: y = x > Ax + b · x + c graph: A node is a {tensor, matrix, vector, scalar} value x

  16. expression: An edge represents a function argument. y = x > Ax + b · x + c A node with an incoming edge is a function of that edge’s tail node. graph: A node knows how to compute its value and the value of its derivative w.r.t each argument (edge) ∂ F times a derivative of an arbitrary input . ∂ f ( u ) ✓ ∂ F f ( u ) = u > ◆ > ∂ f ( u ) ∂ F ∂ f ( u ) = ∂ f ( u ) ∂ u x

  17. expression: y = x > Ax + b · x + c graph: Functions can be nullary, unary, binary, … n -ary. Often they are unary or binary. f ( U , V ) = UV f ( u ) = u > A x

  18. expression: y = x > Ax + b · x + c graph: f ( M , v ) = Mv f ( U , V ) = UV f ( u ) = u > A x Computation graphs are generally directed and acyclic

  19. expression: y = x > Ax + b · x + c graph: f ( x , A ) = x > Ax f ( M , v ) = Mv f ( U , V ) = UV A x f ( u ) = u > A ∂ f ( x , A ) = ( A > + A ) x ∂ x ∂ f ( x , A ) = xx > x ∂ A

  20. expression: y = x > Ax + b · x + c graph: X f ( x 1 , x 2 , x 3 ) = x i i f ( M , v ) = Mv f ( U , V ) = UV f ( u , v ) = u · v f ( u ) = u > A b x c

  21. expression: y = x > Ax + b · x + c graph: X f ( x 1 , x 2 , x 3 ) = x i i y f ( M , v ) = Mv f ( U , V ) = UV f ( u , v ) = u · v f ( u ) = u > A b x c variable names are just labelings of nodes.

  22. Algorithms (1) • Graph construction • Forward propagation • In topological order, compute the value of the node given its inputs

  23. Forward Propagation graph: X f ( x 1 , x 2 , x 3 ) = x i i f ( M , v ) = Mv f ( U , V ) = UV f ( u , v ) = u · v f ( u ) = u > A b x c

  24. Forward Propagation graph: X f ( x 1 , x 2 , x 3 ) = x i i f ( M , v ) = Mv f ( U , V ) = UV f ( u , v ) = u · v f ( u ) = u > A b x c

  25. Forward Propagation graph: X f ( x 1 , x 2 , x 3 ) = x i i f ( M , v ) = Mv f ( U , V ) = UV f ( u , v ) = u · v f ( u ) = u > A b x c

  26. Forward Propagation graph: X f ( x 1 , x 2 , x 3 ) = x i i f ( M , v ) = Mv f ( U , V ) = UV f ( u , v ) = u · v f ( u ) = u > A x > b x c

  27. Forward Propagation graph: X f ( x 1 , x 2 , x 3 ) = x i i f ( M , v ) = Mv f ( U , V ) = UV x > A f ( u , v ) = u · v f ( u ) = u > A x > b x c

  28. Forward Propagation graph: X f ( x 1 , x 2 , x 3 ) = x i i f ( M , v ) = Mv f ( U , V ) = UV x > A f ( u , v ) = u · v f ( u ) = u > A x > b · x b x c

  29. Forward Propagation graph: X f ( x 1 , x 2 , x 3 ) = x i i f ( M , v ) = Mv x > Ax f ( U , V ) = UV x > A f ( u , v ) = u · v f ( u ) = u > A x > b · x b x c

  30. Forward Propagation graph: X f ( x 1 , x 2 , x 3 ) = x i i x > Ax + b · x + c f ( M , v ) = Mv x > Ax f ( U , V ) = UV x > A f ( u , v ) = u · v f ( u ) = u > A x > b · x b x c

  31. Algorithms (2) • Back-propagation: • Process examples in reverse topological order • Calculate the derivatives of the parameters with respect to the final value (This is usually a “loss function”, a value we want to minimize) • Parameter update: • Move the parameters in the direction of this derivative W -= α * dl/dW

  32. Back Propagation graph: X f ( x 1 , x 2 , x 3 ) = x i i f ( M , v ) = Mv f ( U , V ) = UV f ( u , v ) = u · v f ( u ) = u > A b x c

  33. Neural Network Frameworks Examples in this class

  34. Basic Process in (Dynamic) Neural Network Frameworks • Create a model • For each example • create a graph that represents the computation you want • calculate the result of that computation • if training, perform back propagation and update

  35. Recurrent Neural Networks

  36. Long-distance Dependencies in Language • Agreement in number, gender, etc. He does not have very much confidence in himself . She does not have very much confidence in herself . • Selectional preference The reign has lasted as long as the life of the queen . The rain has lasted as long as the life of the clouds .

  37. Recurrent Neural Networks (Elman 1990) • Tools to “remember” information Feed-forward NN Recurrent NN context context lookup lookup transform transform

  38. Unrolling in Time • What does featurizing a sequence look like? I like these pears RNN RNN RNN RNN

Recommend


More recommend