Lecture 5: Morphology Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 6501 Natural Language Processing 1
This lecture v What is the structure of words? v Can we build an analyzer to model the structure of words? v Finite-state automata and regular expression 6501 Natural Language Processing 2
Words v Finite-state methods are particularly useful in dealing with a lexicon v Compact representations of words v Agenda v some facts about words v computational methods 6501 Natural Language Processing 3
A Turkish word v How about English? Example from Julia Hockenmaier, Intro to NLP 6501 Natural Language Processing 4
Longest word in English v Longest word in Shakespeare’s Honorificabilitudinitatibus (27 letters) v Longest non-technical word: Antidisestablishmentarianism (28 letters) v Longest word in a major dictionary Pneumonoultramicroscopicsilicovolcanoconiosis (45 letters) v Longest word in literature Lopadotemachoselachogaleokranioleipsano...pterygon (182 letters) – Ancient greek transliteration v Methionylthreonylthreonylglutaminylarginyl...isoleucine (189,819 letters) – chemical name of a protein 6501 Natural Language Processing 5
What is Morphology? v The ways that words are built up from smaller meaningful units (morphemes) v Two classes of morphemes v Stems: The core meaning-bearing units v Affixes: adhere to stems to change their meanings and grammatical functions v e.g,. dis-grace-ful-ly 6501 Natural Language Processing 6
Inflection Morphology Create different forms of the same word: v Examples: v Verbs: walk, walked, walks v Nouns: Book, books, book’s v Personal pronouns: he, she, her, them, us v Serves a grammatical/semantic purpose that is different from the original but is transparently related to the original 6501 Natural Language Processing 7
Derivational Morphology Create different words from the same lemma: v Nominalization: v V+ -ation: e.g., computerization v V+er: killer v Negation: v Un-: Unod, unseen, … v Mis-: mistake, misunderstand ... v Adjectivization: v V+-able: doable v N+-al: national 6501 Natural Language Processing 8
What else? v Combines words into a new word: v Cream, ice cream, ice cream cone, ice cream cone bakery v Word formation is productive v Google, Googler, to google, to misgoogle, to googlefy, googlification v Google Map, Google Book, … 6501 Natural Language Processing 9
Morphological parsing and generation v Morphological parsing: v Morphological generation v What words can be generated from grace? grace, graceful, gracefully, disgrace, ungrace, undisgraceful, undisgracefully 6501 Natural Language Processing 10
Finite State Automata v FSA and regular expression has the same expressive power v The above FSA accepts string r/baa+!/ 6501 Natural Language Processing 11
Finite State Automata v Terminology: Alphabet just means a finite v It has 5 states set of symbols in the input v Alphabet: {b, a, !} v Start state: 𝑟 " Can have many accept states v Accept state: 𝑟 # v 5 transitions v Are there other machines that correspond to the same language r/baa+!/ ? v Yes 6501 Natural Language Processing 12
Formal definition v You can specify an FSA by enumerating the following things. v The set of states: Q v A finite alphabet: Σ v A start state v A set of accept/final states v A transition function that maps Qx Σ to Q 6501 Natural Language Processing 13
Example -- dollars and Cents 6501 Natural Language Processing 14
Yet another view – table representation b a ! e 0 1 1 2 2 2,3 If you’re in state 1 and you’re looking at 3 4 an a, go to state 2 4 6501 Natural Language Processing 15
Non-Deterministic FSA v 𝜗 - transition v More than one possible next states v Equivalent to deterministic FSA 6501 Natural Language Processing 16
Regular expression v Equivalent to FSA v Matching strings with regular expressions (e.g., perl, python, grep) v translating the regular expression into a machine (a table) and v passing the table and the string to an interpreter 6501 Natural Language Processing 17
Model morphology with FSA v Regular singular nouns are ok v Regular plural nouns have an -s on the end v Irregulars are ok as is 6501 Natural Language Processing 18
Now plug in the words 6501 Natural Language Processing 19
Derivational Rules 6501 Natural Language Processing 20
From recognition to parsing v Now we can use these machines to recognize strings v Can we use the machines to assign a structure to a string? (parsing) v Example: v From “cats” to “cat +N +p” 6501 Natural Language Processing 21
Transitions ε : +N s: + p c:c a:a t:t v c:c reads a c and write a c v ε :+N reads nothing and write +N 6501 Natural Language Processing 22
Challenge: Ambiguity v books: book +N +p or book +V +z (3 rd person) v Non-deterministic FSA: allows multiple paths through a machine lead to the same accept state v Bias the search (or learn) so that a few likely paths are explored 6501 Natural Language Processing 23
Challenge: Spelling rules v The underlying morphemes (e.g., plural-s) can have different surface realization (-s, -es) v cat+s = cats v fox+s = foxes v Make+ing = making v How can we model it? 6501 Natural Language Processing 24
Intermediate representation 6501 Natural Language Processing 25
Overall Scheme v One FST that has explicit information about the lexicon v Lexical level to intermediate forms v Large set of machines that capture spelling rules v Intermediate forms to surface 6501 Natural Language Processing 26
Lexical to intermediate level 6501 Natural Language Processing 27
Intermediate level to surface v The add and “e” rule for –s v Example: fox^s# ↔ foxes# 6501 Natural Language Processing 28
Other application of FST v ELIZA: https://en.wikipedia.org/wiki/ELIZA v Implemented using pattern matching -- FST 6501 Natural Language Processing 29
ELIZA as a FST cascade Human: You don't argue with me. Computer: WHY DO YOU THINK I DON'T ARGUE WITH YOU A simple rule: v 1. Replace you with I and me with you: I don't argue with you. v 2. Replace <...> with Why do you think <...>: Why do you think I don't argue with you. 6501 Natural Language Processing 30
What about compounds? v Compounds have heretical structure: v (((ice cream) cone) bakery) not (ice ((cream cone) bakery)) v ((computer science) (graduate student)) not (computer ((science graduate) student)) v We need context-free grammars to capture this underlying structure 6501 Natural Language Processing 31
Recommend
More recommend