Universal Dependencies are hard to parse – or are they? Ines Rehbein ♣ , Julius Steen ⋆ , Bich-Ngoc Do ⋆ , Anette Frank ⋆ Leibniz ScienceCampus ur Deutsche Sprache Mannheim ♣ Institut f¨ Universit¨ at Heidelberg ⋆ Germany { rehbein,steen,do,frank } @cl.uni-heidelberg.de Abstract do prefer a chain representation of shorter depen- dencies over the UD-style encoding of dependen- Universal Dependency (UD) annotations, cies where content words are heads, with function despite their usefulness for cross-lingual words attached as dependent nodes ( content-head tasks and semantic applications, are not encoding). This is especially relevant for the en- optimised for statistical parsing. In the coding of coordinations, copula, and prepositions paper, we ask what exactly causes the (Marneffe et al., 2014) (see figure 1). Several decrease in parsing accuracy when train- studies have addressed this problem and presented ing a parser on UD-style annotations and experiments on converted trees, offering evidence whether the effect is similarly strong for that a function-head encoding might increase the all languages. We conduct a series of ex- learnability of the annotation scheme (Schwartz et periments where we systematically mod- al., 2012; Popel et al., 2013; Silveira and Manning, ify individual annotation decisions taken 2015; Rosa, 2015; Versley and Kirilin, 2015; Ko- in the UD scheme and show that this re- hita et al., 2017). sults in an increased accuracy for most, but Evaluating the learnability of annotation frame- not for all languages. We show that the en- works, however, is not straightforward and at- coding in the UD scheme, in particular the tempts to do so have often resulted in an apples- decision to encode content words as heads, to-oranges comparison as there are multiple fac- causes an increase in dependency length tors that can impact parsing performance, includ- for nearly all treebanks and an increase in ing the language, the annotation scheme, the size arc direction entropy for many languages, of the treebank, and the parsing model. Even text- and evaluate the effect this has on parsing intrinsic properties such as domain and genre of accuracy. the texts that are included in the treebank can in- 1 Introduction fluence results (Rehbein and van Genabith, 2007). It is not possible to control for all of them and this Syntactic parsing, and in particular dependency has made it extremely difficult to come to conclu- parsing, is an important preprocessing step for sions concerning the learnability of syntactic rep- many NLP applications. Many different parsing resentations for different languages or annotation models are available for many different languages, frameworks. and also a number of annotation schemes that dif- In the paper, we show that the design decisions fer with respect to the linguistic decisions they taken in the UD framework have a negative impact take. One of them is the Universal Dependencies on the learnability of the annotations for many lan- (UD) scheme (Nivre et al., 2016) that has been guages, but not for all. We do this by evaluating developed to support cross-lingual parser transfer, three important design decisions made in the UD and cross-lingual NLP tasks in general, and to pro- scheme and compare their impact on parsing ac- vide a foundation for a sound cross-lingual evalu- curacies for different languages. ation. While the value of the UD framework for mul- The contributions of the paper are as follows. tilingual applications is beyond doubt, it has been We test the claim that content-head dependencies discussed that the annotation decisions taken in are harder to parse, using three parsers that imple- the UD framework are likely to decrease pars- ment different parsing paradigms. We present a ing accuracies, as most dependency-based parsers conversion algorithm that transforms the content-
head encoding of the UD treebanks for coordi- the differences in accuracy between the different nation, copula constructions and for prepositions parsers and between treebanks of varying sizes. into a function-head encoding and show that our Recent work by Gulordava and Merlo (2016) conversion algorithm yields high accuracies (be- has looked at word order variation and its impact tween 98.4% and 100%) for a back-and-forth con- on dependency parsing of 12 languages. They fo- version of gold trees. cus on word order freedom and dependency length We run parsing experiments on the original as two properties of word order that systematic- and the converted UD treebanks and compare the ally vary between different languages. To as- learnability of the annotations across 15 different sess their impact on parsing accuracy, they mod- languages, showing that language-specific prop- ify the original treebanks by minimising the de- erties play a cruicial role for the learning pro- pendency lengths and the entropy of the head- cess. We further show that the changes in depen- direction (whether the head of dependent dep can dency length that result from the different encod- be positioned to the left, to the right, or either ing styles are not responsible for the changes in way), thus creating artificial treebanks with sys- parsing accuracy. tematically different word order properties. Pars- The paper is structured as follows. We first re- ing results on the modified treebanks confirm that view related work ( § 2) and present our conversion a higher variation in word order and longer depen- algorithm ( § 3). The data and setup for our experi- dencies have a negative impact on parsing accura- ments as well as the results are described in section cies. These results, however, do not hold for all languages. 1 § 4. After a short discussion ( § 5) we conclude ( § 6). The work of Gulordava and Merlo (2016) can 2 Related work not be used to compare the impact of different encoding schemes on the learnability of the an- It is well know from the literature that the linguis- notations, as the modifications applied by the au- tic framework used for a particular task has a great thors do result in artificial treebanks and cannot be impact on the learnability of the annotations. Sev- traced back to specific design decisions, thus mak- eral studies have tried to evaluate and compare an- ing the results hard to interpret for our purposes. notation schemes for syntactic parsing of one lan- Kohita et al. (2017) overcome this problem by guage (K¨ ubler, 2005; Schwartz et al., 2012; Hu- providing a conversion algorithm for the three sain and Agrawal, 2012; Silveira and Manning, functional labels case, dep, mark from the UD 2015) or across languages (Mareˇ cek et al., 2013; scheme. They convert the representations for Rosa, 2015; Kohita et al., 2017), or have investi- those labels into function-head encodings and gated the impact of a particular parsing model on present parsing experiments on 19 treebanks from the learnability of specific phenomena encoded in the UD project. Their results corroborate earlier the framework (McDonald and Nivre, 2007; Gold- findings and show that the conversions improve re- berg and Elhadad, 2010). sults for 16 out of 19 languages, using two graph- Popel et al. (2013) present a thorough crosslin- based parsers (MST and RBG) with default feature gual investigation of different ways to encode co- templates. ordination in a dependency framework. They did, Our work is similar in spirit to the one of Ko- however, not address the issue of learnability of hita et al. (2017). We do, however, address partly the different encodings. This has been done in different linguistic phenomena, namely the encod- Maraˇ cek et al. (2013), who reach the somewhat ing of adpositions, copula verbs and coordina- disenchanted conclusion that the observed results tions. In contrast to Kohita et al. (2017), we do of their experiments are “unconvincing and not not back-transform the parser output but evaluate very promising” (Mareˇ cek et al., 2013). the converted trees against a converted version of Versley and Kirilin (2015) look at the influence the gold trees, as it has been shown that the back- of languages and annotation schemes in universal conversion results in error propagation, which is dependency parsing, comparing 5 different parsers reflected in lower parsing accuracies (Silveira and on 5 languages using two variants of UD schemes. They state that encoding content words as head has 1 For German, for instance, word order variability seems a negative impact on parsing results and that PP to have a much stronger impact on parsing results while opti- attachment errors account for a large portion of mising dependency length resulted in a lower LAS.
Recommend
More recommend