Introduction to NLP What Makes Human Languages Interesting? ◮ Connecting minds: how one person’s thoughts reach into another’s ◮ Gender assignment to words, explicit in some languages ◮ Even in English, think of pronouns and names ◮ Cat ◮ Book ◮ Faith ◮ Hope Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 7
Introduction to NLP What Makes Human Languages Challenging? ◮ Sarcasm ◮ Versus logic ◮ No no ◮ Yes yes Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 8
Introduction to NLP Applications of NLP What makes NLP so valuable? Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 9
Introduction to NLP Brief Historical Look ◮ Ad hoc ◮ Inspired by cognitive science ◮ Knowledge-based ◮ Statistical ◮ Speech Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 10
Introduction to NLP Hierarchy of Language Concepts Not to be taken too seriously Passage Discourse Sentence Assertion Unit of meaning Word Morpheme Meaning component Language sound Phoneme Signal Audio ◮ How would you pronounce project ? ◮ Verb vs. noun Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 11
Introduction to NLP Language as a Symbolic System Also called semiotics Pragmatic Meaning based on words and context Meaning based on words Semantics Syntax Structure of symbols Symbol Token (morpheme, phoneme, lexeme) ◮ Holy grail: to express meaning compositionally ◮ Meaning of whole = combination of meanings of parts Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 12
Introduction to NLP Text Normalization ◮ Tokenization ◮ Punctuation ◮ Abbreviations ◮ Number, date, email address, . . . ◮ Clitics: not standalone, e.g., n’t ◮ Case to mark names, e.g., mark vs. Mark ◮ Hyphenated words ◮ Normalization ◮ Case folding ◮ Stemming: remove affixes ◮ Porter stemming: popular but heavy-handed application of rules ◮ Lemmatization: standard root, even if superficially different, e.g., { am, is } ⇒ be ◮ Challenges ◮ Scripts such as Chinese Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 13
Introduction to NLP Minimum Edit Distance Illustration of dynamic programming ◮ Source string X [ n ], prefixes X [1 .. i ], i ∈ [1 .. n ] ◮ Target string Y [ m ], prefixes Y [1 .. j ], j ∈ [1 .. m ] ◮ Edit distance D ( i , j ) between X [1 .. i ] and Y [1 .. j ] ◮ D (0 , 0) = 0; for i ∈ [1 .. n ] and j ∈ [1 .. m ]: D ( i − 1 , j )+del-cost( X [ i ]) D ( i , j ) = min D ( i , j − 1)+ins-cost( Y [ j ]) D ( i − 1 , j − 1)+sub-cost( X [ i ] , Y [ j ]) ◮ Levenshtein values D ( i − 1 , j )+1 D ( i , j − 1)+1 D ( i , j ) = min � 2 X [ i ] � = Y [ j ] D ( i − 1 , j − 1)+ 0 X [ i ] = Y [ j ] ◮ D ( n , m ) is the answer; compute path from ( n , m ) back to (0 , 0) Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 14
Introduction to NLP Levenshtein Example There (Source) ⇒ Their (Target) Target 0 1 2 3 4 5 Source # T H E I R 0 # 1 T 2 H 3 E 4 R 5 E Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 15
Recommend
More recommend