Statistical Dependency Parsing in Korean: From Corpus Generation To - PowerPoint PPT Presentation

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on Statistical Parsing of Morphologically-Rich Languages 12th International Conference on Parsing Technologies Jinho D. Choi & Martha Palmer University of Colorado at Boulder October 6th, 2011 choijd@colorado.edu Thursday, October 6, 2011

Dependency Parsing in Korean • Why dependency parsing in Korean? - Korean is a flexible word order language. S SOV construction S NP-OBJ-1 S NP-SBJ VP NP-SBJ VP AP VP AP VP NP-OBJ VP NP-OBJ VP �� Him she still *T* loved She still him loved OBJ ADV ADV SBJ SBJ OBJ 2 Thursday, October 6, 2011

Dependency Parsing in Korean • Why dependency parsing in Korean? - Korean is a flexible word order language. - Rich morphology makes it easy for dependency parsing. 그녀 ! + ! 는 그 ! + ! 를 �� She + Aux. particle He + Obj. case marker loved SBJ ADV OBJ �� She still him 3 Thursday, October 6, 2011

Dependency Parsing in Korean • Statistical dependency parsing in Korean - Sufficiently large training data is required. • Not much training data available for Korean dependency parsing. • Constituent Treebanks in Korean - Penn Korean Treebank: 15K sentences. - K AIST Treebank: 30K sentences. - Sejong Treebank: 60K sentences. • The most recent and largest Treebank in Korean. • Containing Penn Treebank style constituent trees. 4 Thursday, October 6, 2011

Sejong Treebank • Phrase structure - Including phrase tags, POS tags, and function tags. - Each token can be broken into several morphemes. S ! �� ( �� )/NP+ � /JX �� ! �� /MAG NP-SBJ VP ! �� /NP+ � /JKO AP VP ! �� /NNG+ � /XSV+ � /EP+ � /EF �� NP-OBJ VP �� She still him loved Tokens are mostly separated by white spaces. 5 Thursday, October 6, 2011

Sejong Treebank Phrase-level tags Function tags Sentence Subject S SBJ Quotative clause Object Q OBJ Noun phrase Complement NP CMP Verb phrase Noun modifier VP MOD Copula phrase Predicate modifier VNP AJT Adverb phrase Conjunctive AP CNJ Adnoun phrase Vocative DP INT Interjection phrase Parenthetical IP PRN General noun Adnoun Prefinal EM Auxiliary PR NNG MM EP JX NNP Proper noun MAG General adverb EF Final EM JC Conjunctive PR Bound noun Conjunctive adverb Conjunctive EM Interjection NNB MAJ EC IC NP Pronoun JKS Subjective CP ETN Nominalizing EM SN Number NR Numeral JKC Complemental CP ETM Adnominalizing EM SL Foreign word VV Verb JKG Adnomial CP XPN Noun prefix SH Chinese word VA Adjective JKO Objective CP XSN Noun DS NF Noun-like word Auxiliary predicate Adverbial CP Verb DS Predicate-like word VX JKB XSV NV VCP Copula JKV Vocative CP XSA Adjective DS NA Unknown word Negation adjective Quotative CP Base morpheme SF , SP , SS , SE , SO , SW VCN JKQ XR 6 Thursday, October 6, 2011 ��

Dependency Conversion • Conversion steps - Find the head of each phrase using head-percolation rules. • All other nodes in the phrase become dependents of the head. - Re-direct dependencies for empty categories. • Empty categories are not annotated in the Sejong Treebank. • Skipping this step generates only projective dependency trees. - Label (automatically generated) dependencies. • Special cases - Coordination, nested function tags. 7 Thursday, October 6, 2011

Dependency Conversion • Head-percolation rules - Achieved by analyzing each phrase in the Sejong Treebank. Korean is a head-final language. S r VP;VNP;S;NP|AP;Q;* Q l S|VP|VNP|NP;Q;* NP r NP;S;VP;VNP;AP;* VP r VP;VNP;NP;S;IP;* VNP r VNP;NP;S;* AP r AP;VP;NP;S;* DP r DP;VP;* IP r IP;VNP;* X|L|R r * No rules to find the head morpheme of each token. 8 Thursday, October 6, 2011

Dependency Conversion • Dependency labels - Labels retained from the function tags. - Labels inferred from constituent relations. S input : ( c, p ) , where c is a dependent of p . l output : A dependency label l as c − p . ← NP-SBJ VP begin AP VP if p = root then ROOT → l elif c .pos = AP then ADV → l NP-OBJ VP elif p .pos = AP then AMOD → l elif p .pos = DP then DMOD → l �� elif p .pos = NP then NMOD → l She still him loved elif p .pos = VP|VNP|IP then VMOD → l OBJ else DEP → l ADV end Algorithm 1 : Getting inferred labels. SBJ 9 Thursday, October 6, 2011 ��

Dependency Conversion • Coordination - Previous conjuncts as dependents of the following conjuncts. • Nested function tag - Nodes with nested f-tags become the heads of the phrases. S NP-SBJ VP NP-CNJ NP-SBJ NP-OBJ VP NP-CNJ NP-SBJ �� I_and he_and she home left CNJ CNJ OBJ SBJ 10 Thursday, October 6, 2011

Dependency Parsing • Dependency parsing algorithm - Transition-based, non-projective parsing algorithm. • Choi & Palmer, 2011. - Performs transitions from both projective and non-projective dependency parsing algorithms selectively. • Linear time parsing speed in practice for non-projective trees. • Machine learning algorithm - Liblinear L2-regularized L1-loss support vector. Jinho D. Choi & Martha Palmer. 2011. Getting the Most out of Transition-based Dependency Parsing. In Proceedings of ACL:HLT’11 11 Thursday, October 6, 2011

Dependency Parsing • Feature selection - Each token consists of multiple morphemes (up to 21). - P OS tag feature of each token? • (NNG & XSV & EP & EF & SF) vs. (NNG | XSV | EP | EF | SF) • Sparse information vs. lack of information. Happy medium? ! �� /NNP+ �� /NNG+ � /JX �� Nakrang_ �� Nakrang + Princess + JX ! �� /NNP+ �� /NNG+ � /JKO �� Hodong_ �� Hodong + Prince + JKO �� ! �� /NNG+ � /XSV+ � /EP+ � /EF+./SF Love + XSV + EP + EF + . �� 12 Thursday, October 6, 2011

Dependency Parsing • Morpheme selection The first morpheme FS The last morpheme before JO|DS|EM LS Particles ( J* in Table 1) JK Derivational suffixes ( XS* in Table 1) DS Ending markers ( E* in Table 1) EM The last punctuation, only if there is no other PY morpheme followed by the punctuation �� /NNP+ �� /NNG+ � /JX �� Nakrang + Princess + JX �� /NNP �� /NNG � /JX � � � �� /NNP+ �� /NNG+ � /JKO �� /NNP �� /NNG � /JKO � � � Hodong + Prince + JKO �� /NNG � � � /XSV � /EF � /SF �� /NNG+ � /XSV+ � /EP+ � /EF+./SF �� Love + XSV + EP + EF + . �� 13 �� Thursday, October 6, 2011 ��

Dependency Parsing • Feature extraction - Extract features using only important morphemes. • Individual POS tag features of the1st and 3rd tokens. : NNP 1 , NNG 1 , JK 1 , NNG 3 , XSV 3 , EF 3 • Joined features of POS tags between the 1st and 3rd tokens. : NNP 1 _ NNG 3 , NNP 1 _ XSV 3 , NNP 1 _ EF 3 , JK 1 _ NNG 3 , JK 1 _ XSV 3 - Tokens used: w i , w j , w i±1 , w j±1 �� /NNP+ �� /NNG+ � /JX �� Nakrang + Princess + JX �� /NNP �� /NNG � /JX � � � �� /NNP+ �� /NNG+ � /JKO �� /NNP �� /NNG � /JKO � � � Hodong + Prince + JKO �� /NNG � � � /XSV � /EF � /SF �� /NNG+ � /XSV+ � /EP+ � /EF+./SF �� Love + XSV + EP + EF + . �� 14 �� Thursday, October 6, 2011 ��

Experiments • Corpora - Dependency trees converted from the Sejong Treebank. - Consists of 20 sources in 6 genres. • Newspaper (NP), Magazine (MZ), Fiction (FI), Memoir (ME), Informative Book (IB), and Educational Cartoon (EC). - Evaluation sets are very diverse compared to training sets. • Ensures the robustness of our parsing models. NP MZ FI ME IB EC T 8,060 6,713 15,646 5,053 7,983 1,548 D 2,048 - 2,174 - 1,307 - E 2,048 - 2,175 - 1,308 - # of sentences in each set 15 Thursday, October 6, 2011

Statistical Dependency Parsing in Korean: From Corpus Generation To - PowerPoint PPT Presentation

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on Statistical Parsing of Morphologically-Rich Languages 12th International Conference on Parsing Technologies Jinho D. Choi & Martha Palmer

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

Natural Language Processing Other Syntactic Models Parsing IV Dan Klein UC Berkeley Dependency

Dependency Parsing CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre,

Dependency Parsing & Feature-based Parsing Ling571 Deep Processing Techniques for NLP

Marina Valeeva Outline 2 1. Introduction What is Dependency Parsing? What is a

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Lecture 19: Dependency Grammars and Dependency Parsing Julia Hockenmaier juliahmr@illinois.edu

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency

Thoughts on Learner Data and Motivation Learner Language Dependency Parsing and Dependency

A Fast and Accurate Dependency Parser using Neural Networks Danqi Chen & Christopher D.

NLP Programming Tutorial 12 - Dependency Parsing Graham Neubig Nara Institute of Science and

BBN-ANG-243 Advanced Phonology: Phonological Analysis 1. Introduction Kiss Zoltn / Starcevic

Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations Fei

Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge of a Disjunctive

The Life & Times of Isaac GENESIS 26:1-33 Opening Thoughts This is the only chapter devoted

Identifying Negation in the DGS Corpus Graz, 2019-05-03 Marc Schulder, Thomas Hanke

Assembly Assembly Computational Challenge: assemble individual short fragments (reads) into a

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE Bioinformatics Senior Project Wasay Hussain Spring

Inferring Hierarchical Motifs from Execution Traces Saba Alimadadi , Ali Mesbah, Karthik

Statistical Dependency Parsing in Korean: From Corpus Generation To - PowerPoint PPT Presentation

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on Statistical Parsing of Morphologically-Rich Languages 12th International Conference on Parsing Technologies Jinho D. Choi & Martha Palmer

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

Natural Language Processing Other Syntactic Models Parsing IV Dan Klein UC Berkeley Dependency

Dependency Parsing CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre,

Dependency Parsing &amp; Feature-based Parsing Ling571 Deep Processing Techniques for NLP

Marina Valeeva Outline 2 1. Introduction What is Dependency Parsing? What is a

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Lecture 19: Dependency Grammars and Dependency Parsing Julia Hockenmaier juliahmr@illinois.edu

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency

Thoughts on Learner Data and Motivation Learner Language Dependency Parsing and Dependency

A Fast and Accurate Dependency Parser using Neural Networks Danqi Chen &amp; Christopher D.

NLP Programming Tutorial 12 - Dependency Parsing Graham Neubig Nara Institute of Science and

BBN-ANG-243 Advanced Phonology: Phonological Analysis 1. Introduction Kiss Zoltn / Starcevic

Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations Fei

Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge of a Disjunctive

The Life &amp; Times of Isaac GENESIS 26:1-33 Opening Thoughts This is the only chapter devoted

Identifying Negation in the DGS Corpus Graz, 2019-05-03 Marc Schulder, Thomas Hanke

Assembly Assembly Computational Challenge: assemble individual short fragments (reads) into a

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE Bioinformatics Senior Project Wasay Hussain Spring

Inferring Hierarchical Motifs from Execution Traces Saba Alimadadi , Ali Mesbah, Karthik

Dependency Parsing & Feature-based Parsing Ling571 Deep Processing Techniques for NLP

A Fast and Accurate Dependency Parser using Neural Networks Danqi Chen & Christopher D.

The Life & Times of Isaac GENESIS 26:1-33 Opening Thoughts This is the only chapter devoted