proc of the 37th acl assoc for computational linguistics
play

Proc. of the 37th ACL (Assoc. for Computational Linguistics) - PDF document

Proc. of the 37th ACL (Assoc. for Computational Linguistics) (1999) Ecien t P arsing for Bilexical Con text-F ree Grammars and Head Automaton Grammars Jason Eisner Giorgio Satta Dept. of Computer &


  1. Proc. of the 37th ACL (Assoc. for Computational Linguistics) (1999) E�cien t P arsing for Bilexical Con text-F ree Grammars � and Head Automaton Grammars Jason Eisner Giorgio Satta Dept. of Computer & Information Science Dip. di Elettronica e Informatica Univ ersit y of P ennsylv ania Univ ersit� a di P ado v a 200 South 33rd Street, via Gradenigo 6/A, Philadelphia, P A 19104 USA 35131 P ado v a, Italy jeisner@linc.cis.upenn.edu satta@dei.unipd.it Abstract part y" then dep ends on the grammar writer's assessmen t of whether parties can b e con v ened. Sev eral recen t sto c hastic parsers use bilexic al Sev eral recen t real-w orld parsers ha v e im- grammars, where eac h w ord t yp e idiosyncrat- pro v ed state-of-the-art parsing accuracy b y re- ically prefers particular complemen ts with par- 4 lying on probabilistic or w eigh ted v ersions of ticular head w ords. W e presen t O ( n ) parsing bilexical grammars (Alsha wi, 1996; Eisner, algorithms for t w o bilexical formalisms, impro v- 5 1996; Charniak, 1997; Collins, 1997). The ra- ing the prior upp er b ounds of O ( n ). F or a com- 3 tionale is that soft selectional restrictions pla y mon sp ecial case that w as kno wn to allo w O ( n ) 1 3 a crucial role in disam biguation. parsing (Eisner, 1997), w e presen t an O ( n ) al- The c hart parsing algorithms used b y most of gorithm with an impro v ed grammar constan t. 5 the ab o v e authors run in time O ( n ), b ecause bilexical grammars are enormous (the part of 1 In tro duction the grammar relev an t to a length- n input has Lexicalized grammar formalisms are of b oth 2 size O ( n ) in practice). Hea vy probabilistic theoretical and practical in terest to the com- pruning is therefore needed to get acceptable putational linguistics comm unit y . Suc h for- run times. But in this pap er w e sho w that the malisms sp ecify syn tactic facts ab out eac h w ord complexit y is not so bad after all: of the language|in particular, the t yp e of argumen ts that the w ord can or m ust tak e. � F or bilexicalized con text-free grammars, 4 Early mec hanisms of this sort included catego- O ( n ) is p ossible. 4 rial grammar (Bar-Hillel, 1953) and sub catego- � The O ( n ) result also holds for head au- rization frames (Chomsky , 1965). Other lexi- tomaton grammars. calized formalisms include (Sc hab es et al., 1988; � F or a v ery common sp ecial case of these 3 Mel' � cuk, 1988; P ollard and Sag, 1994). grammars where an O ( n ) algorithm w as Besides the p ossible argumen ts of a w ord, a previously kno wn (Eisner, 1997), the gram- natural-language grammar do es w ell to sp ecify mar constan t can b e reduced without 3 p ossible head w ords for those argumen ts. \Con- harming the O ( n ) prop ert y . v ene" requires an NP ob ject, but some NPs are Our algorithmic tec hnique throughout is to pro- more seman tically or lexically appropriate here p ose new kinds of sub deriv ations that are not than others, and the appropriateness dep ends constituen ts. W e use dynamic programming to largely on the NP's head (e.g., \meeting"). W e assem ble suc h sub deriv ations in to a full parse. use the general term bilexical for a grammar that records suc h facts. A bilexical grammar 2 Notation for con text-free mak es man y stipulations ab out the compatibil- grammars it y of particular pairs of w ords in particular The reader is assumed to b e familiar with roles. The acceptabilit y of \Nora con v ened the con text-free grammars. Our notation fol- � The authors w ere supp orted resp ectiv ely under ARP A 1 Gran t N6600194-C-6043 \Human Language T ec hnology" Other relev an t parsers sim ultaneously consider t w o and Ministero dell'Univ ersit� a e della Ricerca Scien ti�ca or more w ords that are not necessarily in a dep endency e T ecnologica pro ject \Metho dologies and T o ols of High relationship (La�ert y et al., 1992; Magerman, 1995; P erformance Systems for Multimedia Applications." Collins and Bro oks, 1995; Chelba and Jelinek, 1998).

Recommend


More recommend