what is nlp cmsc 473 673
play

What is NLP? CMSC 473/673 http://www.qwantz.com/index.php?comic=170 - PowerPoint PPT Presentation

What is NLP? CMSC 473/673 http://www.qwantz.com/index.php?comic=170 Todays Learning Goals NLP vs. CL Terminology: NLP: vocabulary, token, type, one-hot encoding, dense embedding, parameter/weight, corpus/corpora Linguistics:


  1. What is NLP? CMSC 473/673 http://www.qwantz.com/index.php?comic=170

  2. Today’s Learning Goals • NLP vs. CL • Terminology: – NLP: vocabulary, token, type, one-hot encoding, dense embedding, parameter/weight, corpus/corpora – Linguistics: lexeme, morphology, syntax, semantics, “discourse” • NLP Tasks (high-level): – Part of speech tagging – Syntactic parsing – Entity id/coreference • Universal Dependencies

  3. http://www.qwantz.com/index.php?comic=170

  4. Natural Language Processing ≈ Computational Linguistics

  5. Natural Language Processing ≈ Computational Linguistics science focus computational bio computational chemistry computational X

  6. build a system to translate create a QA system engineering focus Natural Language Processing ≈ Computational Linguistics science focus computational bio computational chemistry computational X

  7. Natural Language Processing ≈ Computational Linguistics Both have impact in/contribute to/draw from: Machine learning Linguistics Information Theory Cognitive Science Data Science Psychology Systems Engineering Political Science Logic Digital Humanities Theory of Computation Education

  8. build a system to translate create a QA system engineering focus Natural Language Processing ≈ Computational Linguistics science focus computational bio computational chemistry computational X these views can co-exist peacefully

  9. What Are Words? Linguists don’t agree (Human) Language-dependent White-space separation is a sometimes okay (for written English longform) Social media? Spoken vs. written? Other languages?

  10. What Are Words? Tokens vs. Types The film got a great opening and the film went on to become a hit . Type : an element of the vocabulary. Token : an instance of that type in running text. Vocabulary : the words (items) you know How many of each?

  11. Terminology: Tokens vs. Types The film got a great opening and the film went on to become a hit . Types Tokens • • The The • • film film • • got got • • a a • • great great • • opening opening • • and and • • the the • • went film • • on went • • to on • • become to • • hit become • • . a • hit • .

  12. Terminology: Tokens vs. Types The film got a great opening and the film went on to become a hit . Types Tokens • • The The • • film film • • got got • • a a • • great great • • opening opening • • and and • • the the • • went film • • on went • • to on • • become to • • hit become • • . a • hit • .

  13. Representing a Linguistic “Blob” 1. An array of sub-blobs word → array of characters How do you sentence → array of words represent these?

  14. Representing a Linguistic “Blob” 1. An array of sub-blobs word → array of characters How do you sentence → array of words represent these? 2. Integer representation/one-hot encoding 3. Dense embedding

  15. Representing a Linguistic “Blob” 1. An array of sub-blobs Let V = vocab size (# types) word → array of characters 1. Represent each word type sentence → array of words with a unique integer i, where 0 ≤ 𝑗 < 𝑊 2. Integer representation/one-hot encoding 3. Dense embedding

  16. Representing a Linguistic “Blob” 1. An array of sub-blobs Let V = vocab size (# types) word → array of characters 1. Represent each word type sentence → array of words with a unique integer i, where 0 ≤ 𝑗 < 𝑊 2. Integer 2. Or equivalently, … representation/one-hot – Assign each word to some encoding index i, where 0 ≤ 𝑗 < 𝑊 – Represent each word w with a V-dimensional binary vector 𝑓 𝑥 , where 𝑓 𝑥,𝑗 = 1 and 0 3. Dense embedding otherwise

  17. One-Hot Encoding Example • Let our vocab be {a, cat, saw, mouse, happy} • V = # types = 5 Q: What is V (# types)?

  18. One-Hot Encoding Example • Let our vocab be {a, cat, saw, mouse, happy} • V = # types = 5 • Assign: How do we a 4 represent “cat?” cat 2 saw 3 mouse 0 happy 1

  19. One-Hot Encoding Example • Let our vocab be {a, cat, saw, mouse, happy} • V = # types = 5 0 • Assign: 0 How do we 𝑓 cat = 1 a 4 represent “cat?” 0 cat 2 0 saw 3 mouse 0 happy 1 How do we represent “happy?”

  20. One-Hot Encoding Example • Let our vocab be {a, cat, saw, mouse, happy} • V = # types = 5 0 • Assign: 0 How do we 𝑓 cat = 1 a 4 represent “cat?” 0 cat 2 0 saw 3 mouse 0 0 happy 1 1 How do we 𝑓 happy = 0 represent 0 “happy?” 0

  21. Representing a Linguistic “Blob” 1. An array of sub-blobs word → array of characters Let E be some embedding sentence → array of words size (often 100, 200, 300, etc.) 2. Integer representation/one-hot Represent each word w with encoding an E-dimensional real- valued vector 𝑓 𝑥 3. Dense embedding

  22. A Dense Representation (E=2)

  23. Where Do We Observe Language? • All around us • NLP/CL: from a corpus (pl: corpora) – Literally a “body” of text • In real life: – Through curators (the LDC) – From the web (scrape Wikipedia, Reddit, etc.) – Via careful human elicitation (lab studies, crowdsourcing) – From previous efforts • In this class: the Universal Dependencies

  24. http://universaldependencies.org/ part-of-speech & syntax for > 120 languages

  25. “Language is Productive” http://www.qwantz.com/index.php?comic=170

  26. Adapted from Jason Eisner, Noah Smith

  27. orthography Adapted from Jason Eisner, Noah Smith

  28. orthography morphology: study of how words change Adapted from Jason Eisner, Noah Smith

  29. Watergate

  30. Troopergate Watergate ➔ Bridgegate Deflategate

  31. orthography morphology lexemes: a basic “unit” of language Adapted from Jason Eisner, Noah Smith

  32. Ambiguity Kids Make Nutritious Snacks

  33. Ambiguity Kids Make Nutritious Snacks Kids Prepare Nutritious Snacks Kids Are Nutritious Snacks sense ambiguity

  34. orthography morphology lexemes syntax: study of structure in language Adapted from Jason Eisner, Noah Smith

  35. Ambiguity British Left Waffles on Falkland Islands

  36. Lexical Ambiguity… British Left Waffles on Falkland Islands British Left Waffles on Falkland Islands British Left Waffles on Falkland Islands

  37. … yields the “Part of Speech Tagging” task British Left Waffles on Falkland Islands British Left Waffles on Falkland Islands Adjective Noun Verb British Left Waffles on Falkland Islands Noun Verb Noun

  38. Parts of Speech Classes of words that behave like one another in “similar” contexts Pronunciation (stress) can differ: object (noun: OB-ject) vs. object (verb: ob-JECT) It can help improve the inputs to other systems (text-to-speech, syntactic parsing)

  39. Syntactic Ambiguity… Pat saw Chris with the telescope on the hill. I ate the meal with friends.

  40. … yields the “Syntactic Parsing” task Pat saw Chris with the telescope on the hill. dobj ncomp I ate the meal with friends. dobj

  41. Syntactic Parsing Syntactic parsing: perform a “meaningful” structural analysis according to grammatical rules I ate the meal with friends VP NP PP NP VP S

  42. Syntactic Parsing Can Help Disambiguate I ate the meal with friends VP NP PP NP VP S

  43. Syntactic Parsing Can S Help Disambiguate NP VP VP NP NP PP I ate the meal with friends VP NP PP NP VP S

  44. Clearly Show Ambiguity… But Not Necessarily All Ambiguity I ate the meal with a fork I ate the meal with gusto I ate the meal with friends VP NP PP NP VP S

  45. orthography morphology lexemes syntax semantics: study of (literal?) meaning Adapted from Jason Eisner, Noah Smith

  46. orthography morphology lexemes syntax semantics pragmatics: study of (implied?) meaning Adapted from Jason Eisner, Noah Smith

  47. orthography morphology lexemes syntax semantics pragmatics discourse: study of how we communicate Adapted from Jason Eisner, Noah Smith

  48. Semantics → Discourse Processing John stopped at the donut store. Courtesy Jason Eisner

  49. Semantics → Discourse Processing John stopped at the donut store. Courtesy Jason Eisner

  50. Semantics → Discourse Processing John stopped at the donut store before work . Courtesy Jason Eisner

  51. Semantics → Discourse Processing John stopped at the donut store on his way home . Courtesy Jason Eisner

  52. Semantics → Discourse Processing John stopped at the donut shop. John stopped at the trucker shop. John stopped at the mom & pop shop. John stopped at the red shop. Courtesy Jason Eisner

  53. Discourse Processing through Coreference I spread the cloth on the table to protect it. I spread the cloth on the table to display it. Courtesy Jason Eisner

  54. Discourse Processing through Coreference I spread the cloth on the table to protect it . I spread the cloth on the table to display it . Courtesy Jason Eisner

  55. Discourse Processing through Coreference I spread the cloth on the table to protect it. I spread the cloth on the table to display it. Courtesy Jason Eisner

Recommend


More recommend