jacy an implemented hpsg grammar of japanese
play

Jacy: an implemented HPSG grammar of Japanese David Moeljadi and - PowerPoint PPT Presentation

Jacy: an implemented HPSG grammar of Japanese David Moeljadi and Takayuki Kuribayashi and many more Division of Linguistics and Multilingual Studies, Nanyang Technological University, Singapore The 25th International Conference on Head-Driven


  1. Jacy: an implemented HPSG grammar of Japanese David Moeljadi and Takayuki Kuribayashi and many more Division of Linguistics and Multilingual Studies, Nanyang Technological University, Singapore The 25th International Conference on Head-Driven Phrase Structure Grammar University of Tokyo, Komaba Campus 2 July 2018 Moeljadi and Kuribayashi (LMS, NTU) Jacy 2 July 2018 1 / 35

  2. Jacy demo: Outline Argument scrambling and omission 2 July 2018 Jacy Moeljadi and Kuribayashi (LMS, NTU) 5. Conclusions and future work 4. Japanese-English machine translation *DEMO 3. Treebanking *DEMO -reru / -rareru verbal endings 2. Phenomena *DEMO 1. Introduction Corpus/Treebank Coverage and evaluation Covered phenomena The current state Grammar engineering Deep Linguistic Processing with HPSG Initiative (DELPH-IN) History and applications Motivation 2 / 35

  3. Siegel, Melanie, Emily M. Bender, and Francis Bond (2016) Jacy: an implemented grammar of Japanese . Stanford: CSLI Publications. Moeljadi and Kuribayashi (LMS, NTU) Jacy 2 July 2018 3 / 35

  4. Motivation Applications that rely on deep linguistic processing, such as message extraction systems, machine translation and dialogue understanding systems Requirement for rich and highly precise information, well-defjned output structures Requirement for robustness: wide coverage, large and extensible lexica, interfaces to preprocessing Requirement for extensibility to multiple languages Requirement for effjcient processing The JACY Japanese HPSG has been developed for and used in real-world applications that require the handling of peripheral phenomena Moeljadi and Kuribayashi (LMS, NTU) Jacy 2 July 2018 4 / 35 are becoming feasible

  5. History of the JACY grammar: Project context knowledge-intensive information extraction 2 July 2018 Jacy Moeljadi and Kuribayashi (LMS, NTU) open-source semantic transfer-based machine translation — JaEn Japanese-English machine translation project with the LOGON initiative: Lexeed project at Nippon Telegraph and Telephone Corporation: Ontology ( http://www.project-deepthought.net ) 5 / 35 1998-2000 2002-2004 ( http://www.dfki.de/~siegel/jacy/jacy.html ) Atsuko Shimada, Dan Flickinger) (Co-operation with Stephan Oepen, Ulrich Callmeier, Monique Sugimoto, 2001-2002 ( http://verbmobil.dfki.de/ ) ▶ Verbmobil : Machine translation of application-oriented spoken dialogues ▶ Co-operation with YY Technologies (CA, USA): Automatic email response ▶ EU project DeepThought : Hybrid and shallow methods for extraction, Hinoki treebank

  6. Deep Linguistic Processing with HPSG Initiative 18-22 June 2018: The 14th Annual DELPH-IN Summit , hosted by 2 July 2018 Jacy Moeljadi and Kuribayashi (LMS, NTU) DELPH-IN discourse (Q&A): https://delphinqa.ling.washington.edu/ wiki page: http://moin.delph-in.net/FrontPage Diderot) Berthold Crysmann (Laboratoire de linguistique formelle, CNRS & U Paris underspecifjed for scopal information (compact representation of ambiguities) (DELPH-IN) formalism, works well with typed feature structures, structures are type hierarchy, effjcient processing 1994; Ivan A. Sag, Wasow, and Emily M. Bender, 2003): feature structures, and NLP applications using HPSG and MRS builds and develops open source grammar, tools for grammar development a research collaboration between linguists and computer scientists 6 / 35 ▶ Head-Driven Phrase Structure Grammar ( HPSG ; Pollard and Ivan A Sag, ▶ Minimal Recursion Semantics ( MRS ; Copestake et al., 2005): fmat semantic

  7. The Development Tools annotating treebanks 2 July 2018 Jacy Moeljadi and Kuribayashi (LMS, NTU) “full forest” without enumerating/unpacking all analyses in the parsing stage DELPH-IN grammars, allowing the selection of an arbitrary tree from the Full Forest Treebanker ( FFTB ) (Packard, 2014): a treebanking tool for grammar (analyzing the coverage and performance), tracking changes, and The Linguistic Knowledge Builder ( LKB ) (Copestake, 2002): grammar Daniel Flickinger, 1998): a tool for testing, profjling the performance of the ITSDB or [incr tsdb()] (pronounced tee ess dee bee plus plus ) (Oepen and DELPH-IN HPSG grammars Answer Constraint Engine ( ACE ) (Packard, 2013): an effjcient processor for ( PET ) (Callmeier, 2000): a very effjcient HPSG parser, for processing Platform for Experimentation with effjcient HPSG processing Techniques development system 7 / 35

  8. Multilingual grammar development English Resource Grammar ( ERG ) (Dan Flickinger, 2000; Dan Flickinger, 2011) Jacy (Siegel, Emily M Bender, and Bond, 2016) Zhong (Fan, Song, and Bond, 2015), for Chinese languages (Mandarin, Cantonese, ...) Indonesian Resource Grammar ( INDRA ) (Moeljadi, Bond, and Song, 2015), for Indonesian ... The LinGO Grammar Matrix (Emily M. Bender, Dan Flickinger, and Oepen, 2002) (Emily M. Bender, Drellishak, et al., 2010): a web-based questionnaire for writing new DELPH-IN grammars Moeljadi and Kuribayashi (LMS, NTU) Jacy 2 July 2018 8 / 35

  9. Other tools Linguistic Type Data-Base ( LTDB ): a documentation containing linguistic 2 July 2018 Jacy Moeljadi and Kuribayashi (LMS, NTU) http://compling.hss.ntu.edu.sg/ltdb/Jacy_1301/ https://github.com/fcbond/ltdb grammar and treebanks, typed feature structure defjnitions of the lexical types description of lexical types, usage examples and distribution based on the https://github.com/ned2/typediff delphin-viz : DELPH-IN data structure visualizations and demo interface JACY) with those in other DELPH-IN grammars (e.g. ERG) typedifg : a tool to investigate and compare phenomena in one grammar (e.g. https://github.com/delph-in/pydelphin PyDelphin : a set of Python libraries for the processing of DELPH-IN data http://chimpanzee.ling.washington.edu/demophin/jacy/ Demophin : a DELPH-IN web demo http://delph-in.github.io/delphin-viz/demo/ 9 / 35

  10. Grammar engineering Figure: Grammar Development Cycle (Emily M. Bender, Dan Flickinger, and Oepen, 2 July 2018 Jacy Moeljadi and Kuribayashi (LMS, NTU) 2011) 10 / 35 Develop initial test suite Develop analysis Identify phenomena Extend test suite to analyze with examples documenting analysis Treebank Implement Parse full analysis test suite Debug Parse sample implementation sentences Compile grammar

  11. Grammar engineering Grammar engineering courses: http://moin.delph-in.net/TeachingCourses Grammar engineering FAQ: http://moin.delph-in.net/GrammarEngineeringFaq Feature Geometry FAQ: http://moin.delph-in.net/GeFaqFeatureGeometry (see also the cheat sheet) Moeljadi and Kuribayashi (LMS, NTU) Jacy 2 July 2018 11 / 35

  12. Installation Install JACY 2 July 2018 Jacy Moeljadi and Kuribayashi (LMS, NTU) http://sweaglesw.org/linguistics/ace/ Install ACE git clone https://github.com/delph-in/jacy.git sudo apt install git Install subversion Install git sudo apt install emacs Install Emacs svn checkout http://svn.emmtee.net/trunk logon Install logon (see LogonInstallation page) sudo apt install subversion 12 / 35

  13. The current state: grammar size 1,889 35,220 30,898 56,944 56,914 Types 1,246 1,709 1,736 2,204 5,681 2,185 2,324 2,473 Table: Change in grammar size over time Moeljadi and Kuribayashi (LMS, NTU) Jacy 2 July 2018 5,147 5,369 Year 2015 2000 2001 2002 2003 2005 2008 2009 Rules Lexemes 3,399 27 50 51 54 47 81 86 137 13 / 35

  14. Covered phenomena Verbs and adjectives 2 July 2018 Jacy Moeljadi and Kuribayashi (LMS, NTU) Honorifjcs Demonstratives Interrogatives Adverbs Particles 14 / 35 Nominal structures ▶ Infmectional and derivational rules ▶ Auxiliary constructions ▶ Passive constructions ▶ Causative ▶ Names and named entities ▶ Pronouns (demonstrative, locative, personal, refmexive) ▶ Nominalizers ▶ Temporal nouns ▶ Noun modifjcation (relative clause) ▶ Numeral classifjers

  15. Test suites and coverage 1145 1136 1500 tanaka/tc-004 78 1172 76 1500 1173 tanaka/tc-003 Natural 79 883 77 866 1116 75 78 85 32 2 July 2018 Jacy Moeljadi and Kuribayashi (LMS, NTU) Table: Coverage on Test suites 63 66 34 tanaka/tc-005 104 haikingu 76 1145 74 1114 1500 kinou3 940 A test suite is a curated collection of test items (sometimes including both Total 126 135 Functional mrs # Sents # Sents Cover (%) # Sents Cover (%) Handling unknowns Parsed as is Test Suite 127 Type http://moin.delph-in.net/MatrixMrsTestSuite negation, modifjcation etc.) some of the basic semantic phenomena (argument structure, quantifjcation, a grammar grammatical an ungrammatical examples) meant to test specifjc properties of 83 93 94 1321 918 1099 kinou2 88 1328 vanilla 88 1500 kinou1 87 105 87 105 120 15 / 35 ▶ ‘ mrs ’: a small set of sentences, originally in English, that are meant to cover ▶ ‘vanilla’: a collection of phenomena that are specifjc to Japanese ▶ etc.

  16. The Hinoki Treebank 2,082 1,604 10.7 No Parse Found 2,826 18.8 Resource Limitation 14.0 Bad Total 15,000 100 Moeljadi and Kuribayashi (LMS, NTU) Jacy 2 July 2018 No Good Trees 4.5 The Lexeed corpus Number The Tanaka corpus Technologies (NICT) Table: Hinoki manual annotation result 679 Type % Good Single Good Tree 7,809 52.1 Multiple Good Trees 16 / 35 ▶ at Nippon Telegraph and Telephone Corporation (NTT) ▶ 53,600 dictionary defjnition sentences and 36,000 example sentences ▶ at the Japanese National Institute of Information and Communications ▶ 15,000 example sentences

Recommend


More recommend