Natural Language Processing Lecture 27: Conclusion
Levels of Linguistc nowledge spoken phonetcs writen orthography phonology “shallower” morphology syntax semantcs “deeper” pragmatcs discourse
uygarlastramadıklarımızdanmıssınızcasına “(behaving) as if you are among those whom we could not civilize”
uygarlastramadıklarımızdanmıssınızcasına “(behaving) as if you are among those whom we could not civilize” uygar “civilized” +las “become” +tr “cause to” +ama “not able” +dık past partciple +lar plural +ımız frst person plural possessive (“our”) +dan second person plural (“y’all”) +mıs past +sınız ablatve case (“from/among”) +casına fnite verb → adverb (“as if”)
Finite-State Automaton • Q: a fnite set of states • q 0 ∈ Q: a special start state • F ⊆ Q: a set of fnal states • Σ: a fnite alphabet • Transitons: ∈ s Σ* q j ... q i ... • Encodes a set of strings that can be recognized by following paths from q 0 to some state in F.
Levels of Linguistc nowledge spoken phonetcs writen orthography phonology “shallower” morphology ambiguity syntax semantcs “deeper” pragmatcs discourse
Noisy Channel What you want What you see y x source source channel decode
Noisy Channel Cats meow ofen NN VB RB y x source source channel decode
Noisy Channel 你好吗? How are you? y x source source channel decode
Noisy Channel Okay, Google y x source source channel decode
Startng and Stopping Unigram model: ... Bigram model: ... Trigram model: ...
Language Modeling Questons • Why do we use context? • What does smoothing do, and why is it necessary? • What do we use to evaluate language models?
Tagging
Broad POS categories open classes closed classes prepositons nouns partcles determiners verbs numerals pronouns adjectves conjunctons adverbs auxiliary verbs
Syntax
Parsing • C Y vs. Earley’s Algorithm – Both dynamic programming – CNF vs. general forms
C Y Algorithm: Chart Noun, - VP,S - S Verb Det NP - NP book this Noun - - fmight Prep PP through PNoun, NP Houston
C Y Equatons C Y Equatons
Semantcs
Where’s the beef ? Sentences from the brown corpus. Extracted from the concordancer in The Compleat Lexical Tutor, htp://www.lextutor.ca/
chicken
Synsets for dog (n) • S: (n) dog, domestc dog, Canis familiaris (a member of the genus Canis (probably descended from the common wolf) that has been domestcated by man since prehistoric tmes; occurs in many breeds) "the dog barked all night" • S: (n) frump, dog (a dull unatractve unpleasant girl or woman) "she got a reputaton as a frump"; "she's a real dog" • S: (n) dog (informal term for a man) "you lucky dog" • S: (n) cad, bounder, blackguard, dog, hound, heel (someone who is morally reprehensible) "you dirty dog" • S: (n) frank, frankfurter, hotdog, hot dog, dog, wiener, wienerwurst, weenie (a smooth-textured sausage of minced beef or pork usually smoked; ofen served on a bread roll) • S: (n) pawl, detent, click, dog (a hinged catch that fts into a notch of a ratchet to move a wheel forward or prevent it from moving backward) • S: (n) andiron, fredog, dog, dog-iron (metal supports for logs in a freplace) "the andirons were too hot to touch" 22
Entty Linking Mary picked up the ball. She threw it to me.
Semantc oles PropBank is a set of verb-sense-specifc “frames” with informal descriptons for their arguments. Consider the word “ Agree ” • ARG0 : agreer • ARG1 : propositon • ARG2 : other entty agreeing [The group] ARG0 agreed [it wouldn’t make an ofer] ARG1 . Usually [John] ARG0 agrees [with Mary on everything] ARG2 .
“Fall (move downward)” in PropBank • arg1 : logical subject, patent, thing falling • arg2 : extent, amount fallen • arg3 : startng point • arg4 : ending point • argM-loc : medium Sales fell to $251.2 million from $278.8 million. The average junk bond fell by 4.2%. The meteor fell through the atmosphere, crashing into Cambridge.
M L #1: First-Order Logic DressCode(ThePorch) Functon Serves(UnionGrill, AmericanFood) Predicates estaurant(UnionGrill) Have(Speaker, FiveDollars) ^ ¬ Have(Speaker, LotOfTime) ∀ x Person(x) ⇒ Have(x, FiveDollars) ∃ x,y Person(x) ^ estaurant(y) ^ ¬HasVisited(x,y)
First Order Logic: Advantages • Flexible • Well-understood • Widely used
EM • We ofen have unlabeled or incomplete data • EM is an for learning without labels, e.g., “classifcaton” without classes • Pick ra ndom centroids! • Itera te the following :! • Use centroids to la bel the da ta ! E-step • Com M-step pute centroids using the la beled da ta ! • Keep doing this until la bels don’t cha ng e
NLP Uses NLP Uses Answer questions using the Web Answer questions using the Web Translate documents from one language to another Translate documents from one language to another Do library research; summarize Do library research; summarize Manage messages intelligently Manage messages intelligently Help make informed decisions Help make informed decisions Follow directions given by any user Follow directions given by any user Fix your spelling or grammar Fix your spelling or grammar Grade exams Grade exams Write poems or novels Write poems or novels Listen and give advice Listen and give advice Estimate public opinion Estimate public opinion Read everything and make predictions Read everything and make predictions Interactively help people learn Interactively help people learn Help disabled people Help disabled people Help refugees/disaster victims Help refugees/disaster victims Document or reinvigorate indigenous languages Document or reinvigorate indigenous languages
More NLP ... • Language Technologies Minor – 4 LT courses plus LT project • 5 th year Masters in Language Technologies
More NLP Courses • 11-492/692 Speech Processing – Fall: Alan W Black – Practcal Systems for Speech • 11-711 Algorithms and NLP – Fall: Yulia Tsvetkov, obert Frederking – esearch oriented • 11-727 Computatonal Semantcs – Spring: Ed Hovy, Teruko Mitamura
More NLP Courses • 11-747 Neural Networks for NLP – Spring: Graham Neubig • 11-830 Computatonal Ethics for NLP – Spring: Yulia Tsvetkov, Alan W Black • 11-777 Advanced Multmodal ML – Fall: Louis-Philippe Morency – Visual, Gesture, Speech • Most Neural Net Classing – Always involve NLP
Recommend
More recommend