Korean morphology Seong-Hwan Jun Monday, April 15, 2013 Morphology - PowerPoint PPT Presentation

Korean morphology Seong-Hwan Jun Monday, April 15, 2013

Morphology • Morpheme: smallest grammatical unit • Word is composed of one or more morphemes • Example: Unbreakable is made up of 1. Un-: bound morpheme, cannot stand on its own 2. break: free morpheme ( lexeme ) 3. -able: free morpheme Monday, April 15, 2013

• Derivational morpheme: changes the part-of- speech as well as semantic meaning: 1. un-: changes the meaning 2. -able: changes the part-of-speech • Inflectional morpheme: does not change the part- of-speech nor semantic meaning: 1. -s: pluralization 2. -ed: past participle Monday, April 15, 2013

Computational morphology • Field of morphology: studies everything about morphemes • Computational morphology is focused on two tasks: 1. Morphological analysis 2. Morphological disambiguation Monday, April 15, 2013

• Morphological analyzer: produce all possible analysis of a word in terms of part-of-speech and inflections • Morphological disambiguation: choose the most plausible analysis • Example: breaks 1. V+3SG 2. N+PL • He took too many breaks during work hours! * N+PL Monday, April 15, 2013

Computational morphology: Korean • Morphemes add on to the main lexeme (agglutination) • Example: 강가에서 (from riverbank) 1. lexeme: 강가 (riverbank) 2. bound morpheme: 에서 (...from) • Previous approaches: dictionary-based, rule-based (extracted from corpus-based) Monday, April 15, 2013

Problems • Unknown words due to finite size of dictionary and corpus • Unknown words are tagged as common noun by default • Rule-based approach... Monday, April 15, 2013

• Suppose you observe word, kicked (assume that you have never seen the word kick before) • What is your guess at the part-of-speech of this word by observing -ed ? • Rule-based approach is only a heuristic and the accuracy depends on the size of the corpus from which the rule was extracted from • Main idea: learn the rules Monday, April 15, 2013

Clustering I • To learn the rules, more data, the better • Cannot possibly expect to annotate/label • Idea: group the words that are “similar” • Words belonging to similar groups can be used for learning rules that are frequently occurring for that group Monday, April 15, 2013

String alignment • Numerous ways to measure similarities between two strings w i and w j 1. Levenshtein distance 2. Probabilistic model over strings • Probabilistic model, sums over all possible alignments of w i and w j : Monday, April 15, 2013

• Log-linear model: features are defined on the alignments. • An example of a feature is how many times a character is aligned with another character. • Example: raining and rainier can be aligned as, raining raini--er f=(0, ..., 0, 1, 1, 2, 1, 0, ..., 0) because r is aligned with r once, a is aligned with a once, i aligned with i twice and so on • If p(w i , w j ) > p(w i , w k ), then we can conclude that w i and w j fit better together. Monday, April 15, 2013

Clustering II: DPMM • Analogy: Chinese Restaurant Process • Customer i (word) enters the restaurant, chooses to seat at a table l with probability proportional to the number of customers (words) already seated at the table • Alternatively, customer may choose to seat at a new table with probability proportional to α 0 (a parameter to be trained) Monday, April 15, 2013

Inference • Once the table is chosen, we can assess the similarity of the customer i (word) with the other customers (words) already seated at the table using the probabilistic model over strings • Inference method: Gibbs sampling method, which iteratively re-assess... - the cluster of the words (CRP) - the part-of-speech tag (tri-gram model) - the inflection tag (log-linear model) Monday, April 15, 2013

Training • Once the grouping of the words become stable, we train the parameters based on the groups by grabbing features from the words • Parameters: 1. θ : probabilistic model over the strings 2. τ : trigram part-of-speech tagger 3. ϕ : inflection tagging model Monday, April 15, 2013

POS tagging Trigram model DPMM Inflection model: log-linear Note: Does not depend on word counts -- solves the unknown words problem Monday, April 15, 2013

Reflection 1. Parts of the code are implemented, not able to put everything together 2. Hence, no experiments and not able to fully explore the models (no model tweaking) 3. Research-based project, too much time spent on learning... in order to put together a paper 4. Learned many new methods, re-learned already known methods really well 5. Getting Korean font installed for MikTex distribution of LaTeX is hard. Monday, April 15, 2013

Korean morphology Seong-Hwan Jun Monday, April 15, 2013 Morphology - PowerPoint PPT Presentation

Korean morphology Seong-Hwan Jun Monday, April 15, 2013 Morphology Morpheme: smallest grammatical unit Word is composed of one or more morphemes Example: Unbreakable is made up of 1. Un-: bound morpheme, cannot stand on its own 2.

Morphology Morphology Morphology yields words with Morphology yields words with predictable

Computational Morphology: Machine learning of morphology Yulia Zinova 09 April 2014 16 July

Update on morphology WP activities M. Huertas-Company (GAL-SWG - morphology) EUCLID France - 7

Lexical Phonology and Morphology February 4, 2016 Lexical Phonology and Morphology Paul

Computational Morphology: Introduction Yulia Zinova SoSe 2020 Yulia Zinova Computational

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Morphology and Corpora: Introduction Marco Baroni University of Bologna Granada Morphology

Discrete Morphology and Distances on graphs Jean Cousty Four-Day Course on Mathematical

Entree is a contemporary Korean dining that offers an array of modern Korean dishes as various

ONLINE (PHASE 2) Kijoo Ko Korean Language Program, UC Berkeley Korean Placement Test To

Basics Of Graph Morphology Sravan Danda April 9, 2015 Table of contents Why Discrete

Mathematical Morphology a non exhaustive overview Adrien Bousseau Mathematical Morphology

Structure and Morphology Structure and Morphology Into what types of overall shapes or

CS 4495 Computer Vision Binary images and Morphology Aaron Bobick School of Interactive

The Basics of Morphology More Suffixation Rules Prefixes Morphological Structure and

Assessing physical habitat condition using River MImAS Why? What? How? Chris Bromley Ecology

Surgical Implications of the Distal Tibia Morphology for Glenoid Augmentation CPT Colleen

MORPHOLOGY EFFECTS ON CONSTITUTIVE PROPERTIES OF FOAMS J. Kll * , S. Hallstrm Department of

Network for Persian on Top of a Morpheme-Segmented Lexicon HAMID HAGHDOOST, EBRAHIM ANSARI ,

Non-grammaticalized number entails an exclusive interpretation of plural morphology Adam Liter

spelling, working with teachers to understand and develop practice Dr Miranda Dodd - University

Morphology and Rheology of Immiscible Polymer Blends under Electric Fields H. Orihara 1 , Y.

Barriers to Progress- migration issues for sediment and species in Irish rivers EPA Water

Sambuz

Useful Links

Newsletter

Mail Us