Example Sentences and Making them Useful for Theoretical and Computational Linguistics Stefan M¨ uller Email: Stefan.Mueller@cl.uni-bremen.de http://www.cl.uni-bremen.de/˜stefan/ DGfS-Jahrestagung Mainz, 27.02.2004
Outline • Why test suites / data collections? • What do we have? • B-Ger-TS • Demo • Suggestions for using test suites / data collections • Guidelines • Conclusions
Why are Test Suites Needed for NLP? • Language is very complex → minimal changes to a grammar may have unexpected effects • Check improvement in grammar development – coverage – processing speed – memory requirements 2/15
What Test Suites and Data Bases are There? • Test Suites developed in TSNLP (Oepen, Netter and Klein, 1997) – English – German – French 3/15
What Test Suites and Data Bases are There? • Test Suites developed in TSNLP (Oepen, Netter and Klein, 1997) – English – German – French • Test Suites that come with [incr TSDB()] wich is part of the LKB (Copestake, 2002) – English (Lingo, CSLI) – German (VM, DFKI) – Spanish – Japanese – Norwegian 3/15
What Test Suites and Data Bases are There? • Test Suites developed in TSNLP (Oepen, Netter and Klein, 1997) – English – German – French • Test Suites that come with [incr TSDB()] wich is part of the LKB (Copestake, 2002) – English (Lingo, CSLI) – German (VM, DFKI) – Spanish – Japanese – Norwegian • Babel Test Suite 3/15
What Test Suites and Data Bases are There? • Test Suites developed in TSNLP (Oepen, Netter and Klein, 1997) – English – German – French • Test Suites that come with [incr TSDB()] wich is part of the LKB (Copestake, 2002) – English (Lingo, CSLI) – German (VM, DFKI) – Spanish – Japanese – Norwegian • Babel Test Suite • A3-Datenbank in T¨ ubingen (Sternefeld, et. al.) • Others? 3/15
Why Should we Have Additional Ones? (I) • Babel Test Suite is unsystematic, naturally grown from a diploma thesis 4/15
Why Should we Have Additional Ones? (I) • Babel Test Suite is unsystematic, naturally grown from a diploma thesis • TSNLP is very systematic: (1) a. die alte Wand b. * der alte Wand c. * das alte Wand d. * des alte Wand e. * den alte Wand f. * dem alte Wand g. * die alte W¨ ande h. * der alte W¨ ande i. * das alte W¨ ande j. * des alte W¨ ande k. * den alte W¨ ande l. * dem alte W¨ ande m. * der alte W¨ anden n. * die alte W¨ anden 4/15
Why Should we Have Additional Ones? (II) but it is only a part of what is needed: • phenomena are missing 5/15
Why Should we Have Additional Ones? (II) but it is only a part of what is needed: • phenomena are missing • There are tons of strange ungrammatical sentences that are relevant in the context of a discussion of a particular analysis only. Such things are not in TSNLP. Examples: – Agreement as head feature and coordination. – Haider’s Designated Argument as a head feature and coordination of unergatives and unakkusatives 5/15
Outline • Why test suites / data collections? • What do we have? • B-Ger-TS • Demo • Suggestions for using such test suites / data collections • Guidelines • Conclusions
B-Ger-TS (I) • B-Ger-TS developed from Babel-TS • contains examples I gathered over the past ten years • I started to systematize it, to crossclassify items with regard to phenomena • extended the database by examples from the literature • provided references to bibliographic sources • eliminated lexical ambiguity 6/15
B-Ger-TS (II) • verb position, scrambling, fronting and island data, extraposition, subjacency, . . . • coherent/incoherent constructions, complex predicates, particle verbs, control and raising, AcI constructions • incomplete category fronting with adjectives and verbs, multiple frontings • adjunction in the nominal and verbal area – attributive adjectives and participles – prepositional phrases – relative clauses • free relative clauses • left dislocation • topic drop 7/15
B-Ger-TS (III) • depictive secondary predicates • passive in various forms (e.g., stative passive, dative passive, lassen passive) • modal infinitives • coordination • and the interaction between all of this! 8/15
B-Ger-TS (III) • depictive secondary predicates • passive in various forms (e.g., stative passive, dative passive, lassen passive) • modal infinitives • coordination • and the interaction between all of this! • items are crossclassified according to the phenomena 8/15
B-Ger-TS (III) • depictive secondary predicates • passive in various forms (e.g., stative passive, dative passive, lassen passive) • modal infinitives • coordination • and the interaction between all of this! • items are crossclassified according to the phenomena • retreival with respect to various aspects is possible 8/15
Outline • Why test suites / data collections? • What do we have? • B-Ger-TS • Demo • Suggestions for using such test suites / data collections • Guidelines • Conclusions
Demo of TSDB 9/15
Suggestions for Using Test Suites / Data Collections • All published grammar fragments should come with a list of used test suites and results. (many already do, mainly those connected to the CSLI/DFKI groups) • example: http://www.cl.uni-bremen.de/Fragments/b-ger-gram.html 10/15
Suggestions for Using Test Suites / Data Collections • All published grammar fragments should come with a list of used test suites and results. (many already do, mainly those connected to the CSLI/DFKI groups) • example: http://www.cl.uni-bremen.de/Fragments/b-ger-gram.html • Journal articles can be written and reviewed with reference to publically availible data collections. 10/15
Outline • Why test suites / data collections? • What do we have? • B-Ger-TS • Demo • Suggestions for using such test suites / data collections • Guidelines • Conclusions
The Format • simple ASCII text • lines with ‘ ;;; ’ indicate a phenomenon until the next line with ‘ ;;; ’ ;;; Extraposition daß der Mann schl¨ aft, der stirbt. ;; Extraposition aus Subjekt Der Mann liebt Maria, der ihn verachtet. ;; Extraposition aus Subjekt im Vorfeld Den Mann liebt Maria, der ihn verachtet. ;; Extraposition aus Objekt im Vorfeld Daß Karl schl¨ aft, ist dem Mann aufgefallen, der ihn kennt. ;; @ nach Haider94 11/15
The Format • simple ASCII text • lines with ‘ ;;; ’ indicate a phenomenon until the next line with ‘ ;;; ’ ;;; Extraposition daß der Mann schl¨ aft, der stirbt. ;; Extraposition aus Subjekt Der Mann liebt Maria, der ihn verachtet. ;; Extraposition aus Subjekt im Vorfeld Den Mann liebt Maria, der ihn verachtet. ;; Extraposition aus Objekt im Vorfeld Daß Karl schl¨ aft, ist dem Mann aufgefallen, der ihn kennt. ;; @ nach Haider94 • everything that follows ‘ ;; ’ and preceedes ‘ @ ’ is a comment • everything that follows ‘ @ ’ is the source of the example 11/15
The Format • simple ASCII text • lines with ‘ ;;; ’ indicate a phenomenon until the next line with ‘ ;;; ’ ;;; Extraposition daß der Mann schl¨ aft, der stirbt. ;; Extraposition aus Subjekt Der Mann liebt Maria, der ihn verachtet. ;; Extraposition aus Subjekt im Vorfeld Den Mann liebt Maria, der ihn verachtet. ;; Extraposition aus Objekt im Vorfeld Daß Karl schl¨ aft, ist dem Mann aufgefallen, der ihn kennt. ;; @ nach Haider94 • everything that follows ‘ ;; ’ and preceedes ‘ @ ’ is a comment • everything that follows ‘ @ ’ is the source of the example • crossclassification of phenomena: listing phenomena separated by ‘+’ ;;; Extraktion + w-Satz * daß ich nicht weiß, dieses Buch warum ich lesen sollte. ;; @GMueller98a:244 11/15
Lexical Ambiguity and Efficiency Ambiguity in case does not hurt, but ambiguity in number does. (2) a. Will der Manager lachen? b. Will der Mann lachen? Manager projects to a full NP, Manager lachen a full VP + sentence 12/15
Lexical Ambiguity and Efficiency Ambiguity in case does not hurt, but ambiguity in number does. (2) a. Will der Manager lachen? b. Will der Mann lachen? Manager projects to a full NP, Manager lachen a full VP + sentence Even worse: If the verb has an optional object, we get unwanted ambiguities: (3) Will der Manager essen? ( der = subject, manager = object) 12/15
Lexical Ambiguity and Efficiency Ambiguity in case does not hurt, but ambiguity in number does. (2) a. Will der Manager lachen? b. Will der Mann lachen? Manager projects to a full NP, Manager lachen a full VP + sentence Even worse: If the verb has an optional object, we get unwanted ambiguities: (3) Will der Manager essen? ( der = subject, manager = object) (4) a. Will der Manager essen? → 307 passive edges b. Will der Mann essen? → 114 passive edges 12/15
Lexical Ambiguity and Usability of Test Suites (Grammatical Sentences) ihr is ambiguous between dative feminine and second person plural and the possessive pronoun. A theory/grammar that makes wrong claims about case could analyze (5) as a sentence with two nominatives. (5) Ihr helfen wir. So the grammatical sentence could be parsed although the theory assigns a wrong structure/wrong case values. 13/15
Recommend
More recommend