Better Arabic Parsing Baselines, Evaluations, and Analysis Spence Green and Christopher D. Manning Stanford University August 27, 2010
Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common Multilingual Parsing Questions... Is language X “harder” to parse than language Y? 2 / 32
Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common Multilingual Parsing Questions... Is language X “harder” to parse than language Y? ◮ Morphologically–rich X 2 / 32
Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common Multilingual Parsing Questions... Is language X “harder” to parse than language Y? ◮ Morphologically–rich X Is treebank X “better/worse” than treebank Y? 2 / 32
Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common Multilingual Parsing Questions... Is language X “harder” to parse than language Y? ◮ Morphologically–rich X Is treebank X “better/worse” than treebank Y? Does feature Z “help more” for language X than Y? 2 / 32
Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common Multilingual Parsing Questions... Is language X “harder” to parse than language Y? ◮ Morphologically–rich X Is treebank X “better/worse” than treebank Y? Does feature Z “help more” for language X than Y? ◮ Lexicalization ◮ Morphological annotations ◮ Markovization ◮ etc. 2 / 32
Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments “Underperformance” Relative to English Arabic – This Paper 81.1 Arabic 75.8 Italian 75.6 French 77.9 German 80.1 Bulgarian 81.6 Chinese 83.7 English 90.1 Evalb F1 - All Sentence Lengths (Petrov, 2009) 3 / 32
Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Why Arabic / Penn Arabic Treebank (ATB)? Annotation style similar to PTB Relatively little segmentation (cf. Chinese) Richer morphology (cf. English) More syntactic ambiguity (unvocalized) 4 / 32
Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments ATB Details Parts 1–3 (not including part 3, v3.2) Newswire only ◮ Agence France Presse, Al–Hayat, Al–Nahar Corpus/experimental characteristics ◮ 23k trees ◮ 740k tokens ◮ Shortened “Bies” POS tags ◮ Split: 2005 JHU workshop 5 / 32
Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Arabic Preliminaries Diglossia: “Arabic” − → MSA 6 / 32
Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Arabic Preliminaries Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible 6 / 32
� � Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Arabic Preliminaries Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible Devocalization � � � � � � � � � � � � � � � � � � � � � 6 / 32
� � Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Arabic Preliminaries Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible Devocalization � � � � � � � � � � � � � � � � � � � � � ���� ����� ��� 6 / 32
� � Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Arabic Preliminaries Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible Devocalization � � � � � � � � � � � � � � � � � � � � � ���� ����� ��� Segmentation: an analyst’s choice! ◮ ATB uses clitic segmentation 6 / 32
� � Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Arabic Preliminaries Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible Devocalization � � � � � � � � � � � � � � � � � � � � � ���� ����� ��� Segmentation: an analyst’s choice! ◮ ATB uses clitic segmentation ���� �� ��� �� � + + 6 / 32
Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Syntactic Ambiguity in Arabic This talk: ◮ Devocalization ◮ Discourse level coordination ambiguity Many other types: ◮ Adjectives / Adjective phrases ◮ Process nominals — maSdar ◮ Attachment in annexation constructs (Gabbard and Kulick, 2008) 7 / 32
Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Devocalization: Inna and her Sisters POS Head Of � � � inna “indeed” VBP VP � � � anna “that” IN SBAR � � � in “if” IN SBAR � � � an “to” IN SBAR 8 / 32
Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Devocalization: Inna and her Sisters POS Head Of �� inna “indeed” VBP VP �� anna “that” IN SBAR �� in “if” IN SBAR �� an “to” IN SBAR 9 / 32
Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Devocalization: Inna and her Sisters VP VP VBD S VBD SBAR VP ﺖﻓﺎﺿﺍ ﺖﻓﺎﺿﺍ PUNC IN NP . . . she added she added PUNC VBP NP “ NN ﻥﺍ . . . “ NN Indeed ﻥﺍ ﻡﺍﺪﺻ Indeed Saddam ﻡﺍﺪﺻ Saddam Reference Stanford 10 / 32
Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Discourse–level Coordination Ambiguity S S CC S CC S NP VP VP NP PP NP ﻭ ﻭ and and ◮ S < S in 27.0% of dev set trees ◮ NP < CC in 38.7% of dev set trees 11 / 32
Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Discourse–level Coordination Ambiguity Leaf Ancestor metric reference chains (Berkeley) ◮ score ∈ [ 0 , 1 ] Score # Gold 0.696 34 S < S < VP < NP < PRP 0.756 170 S < VP < NP < CC 0.768 31 S < S < VP < S < VP < PP < IN 0.796 86 S < S < VP < SBAR < IN 0.804 52 S < S < NP < NN 12 / 32
Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Treebank Comparison Compared ATB gross corpus statistics to: ◮ Chinese — CTB6 ◮ English — WSJ sect. 2-23 ◮ German — Negra 13 / 32
Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Treebank Comparison Compared ATB gross corpus statistics to: ◮ Chinese — CTB6 ◮ English — WSJ sect. 2-23 ◮ German — Negra The ATB isn’t that unusual! 13 / 32
Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Corpus Features in Favor of the ATB WSJ 0.82 Negra 0.46 CTB6 1.18 ATB 1.04 Non-terminal / Terminal Ratio 14 / 32
Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Corpus Features in Favor of the ATB WSJ 0.82 13.2% Negra 0.46 30.5% CTB6 1.18 22.2% ATB 1.04 16.8% Non-terminal / Terminal Ratio OOV Rate 14 / 32
Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Sentence Length Negatively Affects Parsing WSJ 23.8 Negra 17.2 CTB6 27.7 ATB 31.5 Avg. Sentence Length 15 / 32
Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Sentence Length Negatively Affects Parsing WSJ 23.8 Negra 17.2 CTB6 27.7 ATB 31.5 Avg. Sentence Length 40 words is not a sufficient limit for evaluation! 15 / 32
Motivation Syntax and Annotation Features Grammar Development Experiments Developing a Manually Annotated Grammar Klein and Manning (2003)–style state splits ◮ Human–interpretable ◮ Features can inform treebank revision 16 / 32
Motivation Syntax and Annotation Features Grammar Development Experiments Developing a Manually Annotated Grammar Klein and Manning (2003)–style state splits ◮ Human–interpretable ◮ Features can inform treebank revision NP NP—idafa NN NP NN NP—idafa NN NP NN NP DTNN DTNN 16 / 32
Motivation Syntax and Annotation Features Grammar Development Experiments Developing a Manually Annotated Grammar Klein and Manning (2003)–style state splits ◮ Human–interpretable ◮ Features can inform treebank revision NP NP—idafa NN NP NN NP—idafa NN NP NN NP DTNN DTNN Alternative: automatic splits (Berkeley parser) 16 / 32
Motivation Syntax and Annotation Features Grammar Development Experiments Feature: markContainsVerb S—hasVerb NP-SBJ SBAR—hasVerb NN NP IN S—hasVerb NP-TPC VP—hasVerb VB . . . 17 / 32
Recommend
More recommend