what makes human languages interesting
play

What Makes Human Languages Interesting? Connecting minds: how one - PowerPoint PPT Presentation

Introduction to NLP What Makes Human Languages Interesting? Connecting minds: how one persons thoughts reach into anothers Gender assignment to words, explicit in some languages Even in English, think of pronouns and names Cat


  1. Introduction to NLP What Makes Human Languages Interesting? ◮ Connecting minds: how one person’s thoughts reach into another’s ◮ Gender assignment to words, explicit in some languages ◮ Even in English, think of pronouns and names ◮ Cat ◮ Book ◮ Faith ◮ Hope Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 7

  2. Introduction to NLP What Makes Human Languages Challenging? ◮ Sarcasm ◮ Versus logic ◮ No no ◮ Yes yes Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 8

  3. Introduction to NLP Applications of NLP What makes NLP so valuable? Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 9

  4. Introduction to NLP Brief Historical Look ◮ Ad hoc ◮ Inspired by cognitive science ◮ Knowledge-based ◮ Statistical ◮ Speech Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 10

  5. Introduction to NLP Hierarchy of Language Concepts Not to be taken too seriously Passage Discourse Sentence Assertion Unit of meaning Word Morpheme Meaning component Language sound Phoneme Signal Audio ◮ How would you pronounce project ? ◮ Verb vs. noun Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 11

  6. Introduction to NLP Language as a Symbolic System Also called semiotics Pragmatic Meaning based on words and context Meaning based on words Semantics Syntax Structure of symbols Symbol Token (morpheme, phoneme, lexeme) ◮ Holy grail: to express meaning compositionally ◮ Meaning of whole = combination of meanings of parts Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 12

  7. Introduction to NLP Text Normalization ◮ Tokenization ◮ Punctuation ◮ Abbreviations ◮ Number, date, email address, . . . ◮ Clitics: not standalone, e.g., n’t ◮ Case to mark names, e.g., mark vs. Mark ◮ Hyphenated words ◮ Normalization ◮ Case folding ◮ Stemming: remove affixes ◮ Porter stemming: popular but heavy-handed application of rules ◮ Lemmatization: standard root, even if superficially different, e.g., { am, is } ⇒ be ◮ Challenges ◮ Scripts such as Chinese Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 13

  8. Introduction to NLP Minimum Edit Distance Illustration of dynamic programming ◮ Source string X [ n ], prefixes X [1 .. i ], i ∈ [1 .. n ] ◮ Target string Y [ m ], prefixes Y [1 .. j ], j ∈ [1 .. m ] ◮ Edit distance D ( i , j ) between X [1 .. i ] and Y [1 .. j ] ◮ D (0 , 0) = 0; for i ∈ [1 .. n ] and j ∈ [1 .. m ]:  D ( i − 1 , j )+del-cost( X [ i ])   D ( i , j ) = min D ( i , j − 1)+ins-cost( Y [ j ]) D ( i − 1 , j − 1)+sub-cost( X [ i ] , Y [ j ])   ◮ Levenshtein values  D ( i − 1 , j )+1    D ( i , j − 1)+1   D ( i , j ) = min � 2 X [ i ] � = Y [ j ] D ( i − 1 , j − 1)+   0 X [ i ] = Y [ j ]    ◮ D ( n , m ) is the answer; compute path from ( n , m ) back to (0 , 0) Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 14

  9. Introduction to NLP Levenshtein Example There (Source) ⇒ Their (Target) Target 0 1 2 3 4 5 Source # T H E I R 0 # 1 T 2 H 3 E 4 R 5 E Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 15

Recommend


More recommend