Unsupervised Learning of Morphology by Using Syntactic Categories - PowerPoint PPT Presentation

Unsupervised Learning of Morphology by Using Syntactic Categories Unsupervised Learning of Morphology by Using Syntactic Categories Burcu Can Suresh Manandhar Department of Computer Science University of York Morpho Challenge, 2009

Unsupervised Learning of Morphology by Using Syntactic Categories Outline 1 Introduction Model Description 2 Inducing Syntactic Categories Inducing Morphological Paradigms Merging Paradigms Morphological Segmentation Results 3 Datasets Model Parameters Results Conclusion 4

Unsupervised Learning of Morphology by Using Syntactic Categories Introduction Morphology and Part-of-Speech (PoS) Inspiration for another approach for morphology learning Correlation between morphological and syntactic information Example PoS category 1 : Present participles Words : going, walking, washing . . . PoS category 2 : Adverbs Words : badly, deeply, strongly . . . PoS category 3 : Plural nouns Words : students, pupils, girls, families . . . Chance of joint learning of two knowledges (morphology and PoS)

Unsupervised Learning of Morphology by Using Syntactic Categories Introduction Previous Research Using Morphology-PoS Together Hu et al. [4] extends the Minimum Description Length (MDL) based framework due to Goldsmith [3] exploring the link between morphological signatures and PoS tags Clark and Tim [2] experiment with the fixed endings of the words for PoS clustering Our work: A clustering algorithm based on PoS categories for inducing morphological paradigms

Unsupervised Learning of Morphology by Using Syntactic Categories Model Description Inducing Syntactic Categories Outline 1 Introduction Model Description 2 Inducing Syntactic Categories Inducing Morphological Paradigms Merging Paradigms Morphological Segmentation Results 3 Datasets Model Parameters Results Conclusion 4

Unsupervised Learning of Morphology by Using Syntactic Categories Model Description Inducing Syntactic Categories Inducing Syntactic Categories Clark’s [1] syntactic clustering method Clark’s [1] distributional clustering approach for syntactic categories is used. Each word is clustered by using its context (previous-following word) For the distributional similarity between the words, Kullback-Leibler (KL) divergence: Theorem p ( x ) log p ( x ) � D ( p � q ) = (1) q ( x ) x where p , q are the context distributions of the words being compared and x ranges over contexts.

Unsupervised Learning of Morphology by Using Syntactic Categories Model Description Inducing Syntactic Categories Inducing Syntactic Categories Clark’s [1] syntactic clustering method In Clark’s approach [1], the probability of a context for a target word is defined as: Theorem p ( < w 1 , w 2 > ) = p ( < c ( w 1 ) , c ( w 2 ) > ) p ( w 1 | c ( w 1 )) p ( w 2 | c ( w 2 )) (2) where c ( w 1 ) , c ( w 2 ) denote the PoS cluster of words w 1 , w 2 respectively. Starts with K clusters with most frequent words, and gradually filling with the words having the minimum KL divergence with one of the K clusters. We set K=77, the number of tags defined in CLAWS tagset.

Unsupervised Learning of Morphology by Using Syntactic Categories Model Description Inducing Syntactic Categories Inducing Syntactic Categories Some example PoS clusters Some example PoS clusters are given: Example Cluster 1: much far badly deeply strongly thoroughly busy rapidly slightly heavily neatly widely closely easily profoundly readily eagerly . . . Cluster 2: made found held kept bought heard played left passed finished lost changed . . . Cluster 3: should may could would will might did does . . . Cluster 4: working travelling flying fighting running moving playing turning . . . Cluster 5: people men women children girls horses students pupils staff families . . .

Unsupervised Learning of Morphology by Using Syntactic Categories Model Description Inducing Morphological Paradigms Outline 1 Introduction Model Description 2 Inducing Syntactic Categories Inducing Morphological Paradigms Merging Paradigms Morphological Segmentation Results 3 Datasets Model Parameters Results Conclusion 4

Unsupervised Learning of Morphology by Using Syntactic Categories Model Description Inducing Morphological Paradigms Inducing Morphological Paradigms Paradigm Definition Morphemes are tied to PoS clusters. Our definition of paradigm deviates from that of Goldsmith [3] in that: A paradigm φ is a list of morpheme/cluster pairs i.e. φ = { m 1 / c 1 , . . . , m n / c n } . Associated with each paradigm is a list of stems i.e. the list of stems that can combine with each of the morphemes m i to produce a word belonging to the c i PoS category.

Unsupervised Learning of Morphology by Using Syntactic Categories Model Description Inducing Morphological Paradigms Inducing Morphological Paradigms Algorithm for Capturing Paradigms across PoS Clusters Algorithm 1: Apply unsupervised PoS clustering to the input corpus 2: Split all the words in each PoS cluster at all split points, and create potential morphemes 3: For each PoS cluster c and morpheme m , compute maximum likelihood estimates of p ( m | c ) 4: Keep all m (in c ) with p ( m | c ) > t , where t is a threshold 5: for all PoS clusters c 1 , c 2 do 6: Pick morphemes m 1 in c 1 and m 2 in c 2 with the highest number of common stems 7: Store φ = { m 1 / c 1 , m 2 / c 2 } as the new paradigm 8: Remove all words in c 1 with morpheme m 1 and associate these words with φ . 9: Remove all words in c 2 with morpheme m 2 and associate these words with φ . 10: end for

Unsupervised Learning of Morphology by Using Syntactic Categories - PowerPoint PPT Presentation

Unsupervised Learning of Morphology by Using Syntactic Categories Unsupervised Learning of Morphology by Using Syntactic Categories Burcu Can Suresh Manandhar Department of Computer Science University of York Morpho Challenge, 2009

Morphology Morphology Morphology yields words with Morphology yields words with predictable

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Computational Morphology: Machine learning of morphology Yulia Zinova 09 April 2014 16 July

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning of the Morphology of a Natural Language John Goldsmith University of

Update on morphology WP activities M. Huertas-Company (GAL-SWG - morphology) EUCLID France - 7

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Lexical Phonology and Morphology February 4, 2016 Lexical Phonology and Morphology Paul

Computational Morphology: Introduction Yulia Zinova SoSe 2020 Yulia Zinova Computational

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Morphology and Corpora: Introduction Marco Baroni University of Bologna Granada Morphology

UI Redressing A-acks on Android Devices Marcus Niemietz

Audited Financial Results for year ended June 30 2017 Agenda 2 Introduction Post unbundling

China in a changing global economy Haihong Gao Institute of World Economics and Politics Chinese

After the EU Referendum: What Next for Britain and Europe? Prof. Simon Hix London School of

rr sss

Shandian Zhe: Probabilistic Machine Learning zhe@cs.utah.edu Assistant Professor, School of

Union-find Data Structure Last time Today Next Trees within Graphs Trees within Graphs Final

! GitHub's online schema migrations for MySQL Tom Krouper, Shlomi Noach GitHub Illustrated