tutorial outline tutorial outline xle xle
play

Tutorial Outline Tutorial Outline XLE: XLE: What is a deep - PowerPoint PPT Presentation

Tutorial Outline Tutorial Outline XLE: XLE: What is a deep grammar and why would you want Grammar Development Platform Grammar Development Platform one? Parser/Generator XLE: A First Walkthrough Parser/Generator Robustness


  1. Tutorial Outline Tutorial Outline XLE: XLE: � What is a deep grammar and why would you want Grammar Development Platform Grammar Development Platform one? Parser/Generator � XLE: A First Walkthrough Parser/Generator � Robustness techniques � Generation � Disambiguation Miriam Butt ( Miriam Butt (Universit Universitä ät t Konstanz Konstanz) ) � Applications: Tracy Holloway King (PARC) Tracy Holloway King (PARC) – Machine Translation – Sentence Condensation – Computer Assisted Language Learning (CALL) – Knowledge Representation COLING 2004 Tutorial COLING 2004 Tutorial COLING 2004: XLE tutorial Applications of Language Engineering Applications of Language Engineering Deep grammars Deep grammars Post-Search � Provide detailed syntactic/semantic analyses Sifting Google – HPSG (LinGO, Matrix), LFG (ParGram) Domain Coverage Broad Autonomous Knowledge Filtering Alta – Grammatical functions, tense, number, etc. Vista Mary wants to leave. AskJeeves subj(want~1,Mary~3) Document Base comp(want~1,leave~2) Management Good subj(leave~2,Mary~3) Translation Narrow Restricted Useful tense(leave~2,present) Knowledge Dialogue Summary Fusion Manually-tagged � Usually manually constructed Keyword Search Natural Microsoft Dialogue Paperclip Low High Functionality COLING 2004: XLE tutorial COLING 2004: XLE tutorial

  2. Deep analysis matters… … Deep analysis matters Why would you want one? one? Why would you want if you care about the answer if you care about the answer � Meaning sensitive applications Example: – overkill for many NLP applications A delegation led by Vice President Philips, head of the chemical � Applications which use shallow methods for division, flew to Chicago a week after the incident. English may not be able to for "free" word Question: Who flew to Chicago? order languages Candidate answers: – can read many functions off of trees in English » subj: NP sister to VP division closest noun shallow but wrong head next closest » obj: first NP sister to V V.P. Philips next – need other information in German, Japanese, etc. deep and right delegation furthest away but Subject of flew COLING 2004: XLE tutorial COLING 2004: XLE tutorial Why don't people use them? Why don't people use them? Why should one pay attention now? Why should one pay attention now? New Generation of Large-Scale Grammars: � Time consuming and expensive to write – shallow parsers can be induced automatically from � Robustness: a training set – Integrated Chunk Parsers � Brittle – Bad input always results in some (possibly good) output – shallow parsers produce something for everything � Ambiguity : � Ambiguous – Integration of stochastic methods – shallow parsers rank the outputs – Optimality Theory used to rank/pick alternatives � Slow � Speed: comparable to shallow parsers – shallow parsers are very fast (real time) � Other gating items for applications that need � Accuracy and information content: deep grammars – far beyond the capabilities of shallow parsers. COLING 2004: XLE tutorial COLING 2004: XLE tutorial

  3. XLE at PARC Basic LFG XLE at PARC Basic LFG � Constituent-Structure: tree � Platform for Developing Large-Scale LFG � Functional-Structure: Attribute Value Matrix Grammars universal � LFG (Lexical-Functional Grammar) – Invented in the 1980s PRED 'appear<SUBJ>' (Joan Bresnan and Ronald Kaplan) S TENSE pres – Theoretically stable � Solid Implementation NP VP PRED 'pro' SUBJ � XLE is implemented in C, used with emacs, tcl/tk PERS 3 PRON V � XLE includes a parser , generator and transfer they appear NUM pl component. COLING 2004: XLE tutorial COLING 2004: XLE tutorial Grammar components Grammar components Basic configuration file Basic configuration file TOY ENGLISH CONFIG (1.0) � Configuration: links components ROOTCAT S. � Annotated phrase structure rules FILES . LEXENTRIES (TOY ENGLISH). � Lexicon RULES (TOY ENGLISH). TEMPLATES (TOY ENGLISH). � Templates GOVERNABLERELATIONS SUBJ OBJ OBJ2 OBL COMP XCOMP. � Other possible components SEMANTICFUNCTIONS ADJUNCT TOPIC. NONDISTRIBUTIVES NUM PERS. – Finite State (FST) morphology EPSILON e. – disambiguation feature file OPTIMALITYORDER NOGOOD. ---- COLING 2004: XLE tutorial COLING 2004: XLE tutorial

  4. Grammar sections Syntactic rules Grammar sections Syntactic rules � Annotated phrase structure rules � Rules, templates, lexicons Category --> Cat1: Schemata1; � Each has: Cat2: Schemata2; – version ID Cat3: Schemata3. – component ID – XLE version number (1.0) – terminated by four dashes ---- S --> NP: (^ SUBJ)=! � Example (! CASE)=NOM; STANDARD ENGLISH RULES (1.0) VP: ^=!. ---- COLING 2004: XLE tutorial COLING 2004: XLE tutorial Another sample rule Another sample rule Lexicon Lexicon � Basic form for lexical entries: "indicate comments" VP --> V: ^=!; "head" word Category1 Morphcode1 Schemata1; Category2 Morphcode2 Schemata2. (NP: (^ OBJ)=! "() = optionality" (! CASE)=ACC) walk V * (^ PRED)='WALK<(^ SUBJ)>'; PP*: ! $ (^ ADJUNCT). "$ = set" N * (^ PRED) = 'A-WALK' . girl N * (^ PRED) = 'A-GIRL'. VP consists of: a head verb kick V * { (^ PRED)='KICK<(^ SUBJ)(^ OBJ)>' an optional object |(^ PRED)='KICK<(^ SUBJ)>'}. zero or more PP adjuncts the D * (^ DEF)=+. COLING 2004: XLE tutorial COLING 2004: XLE tutorial

  5. Templates Template example cont. Templates Template example cont. No Template � Parameterize template to pass in values � Express generalizations girl N * (^ PRED)='GIRL' CN(P) = (^ PRED)='P' – in the lexicon girl N * @(CN GIRL). { (^ NUM)=SG { (^ NUM)=SG – in the grammar boy N * @(CN BOY). (^ DEF) (^ DEF) – within the template space |(^ NUM)=PL}. |(^ NUM)=PL}. With Template � Template can call other templates TEMPLATE: CN = { (^ NUM)=SG (^ DEF) INTRANS(P) = (^ PRED)='P<(^ SUBJ)>'. |(^ NUM)=PL}. TRANS(P) = (^ PRED)='P<(^ SUBJ)(^ OBJ)>'. girl N * (^ PRED)='GIRL' @CN. OPT-TRANS(P) = { @(INTRANS P) | @(TRANS P) }. boy N * (^ PRED)='BOY' @CN. COLING 2004: XLE tutorial COLING 2004: XLE tutorial Outline: Robustness Outline: Robustness Parsing a string Parsing a string Dealing with brittleness � create-parser demo-eng.lfg � parse "the girl walks" � Missing vocabulary – you can't list all the proper names in the world � Missing constructions Walkthrough Demo – there are many constructions theoretical linguistics rarely considers (e.g. dates, company names) � Ungrammatical input – real world text is not always perfect – sometimes it is really horrendous COLING 2004: XLE tutorial COLING 2004: XLE tutorial

  6. Dealing with Missing Vocabulary Building lexical entries Dealing with Missing Vocabulary Building lexical entries � Build vocabulary based on the input of � Lexical entries shallow methods -unknown N XLE @(COMMON-NOUN %stem). +Noun N-SFX XLE @(PERS 3). – fast +Pl N-NUM XLE @(NUM pl). – extensive � Rule – accurate Noun -> N N-SFX N-NUM. � Finite-state morphologies � Structure falls -> fall +Noun +Pl [ PRED 'fall' NTYPE common fall +Verb +Pres +3sg PERS 3 � Build lexical entry on-the-fly from the NUM pl ] morphological information COLING 2004: XLE tutorial COLING 2004: XLE tutorial Guessing words Guessing words Using the lexicons Using the lexicons � Use FST guesser if the morphology doesn't Rank the lexical lookup � know the word 1. overt entry in lexicon 2. entry built from information from morphology – Capitalized words can be proper nouns 3. entry built from information from guesser Saakashvili -> Saakashvili +Noun +Proper +Guessed » quality will depend on language type – ed words can be past tense verbs or adjectives Use the most reliable information � fumped -> fump +Verb +Past +Guessed fumped +Adj +Deverbal +Guessed Fall back only as necessary � COLING 2004: XLE tutorial COLING 2004: XLE tutorial

  7. Missing constructions Grammar engineering approach Missing constructions Grammar engineering approach � Even large hand-written grammars are not � First try to get a complete parse complete � If fail, build up chunks that get complete – new constructions, especially with new corpora parses – unusual constructions � Have a fall-back for things without even � Generally longer sentences fail chunk parses � Link these chunks and fall-backs together in a Solution: Fragment and Chunk Parsing single structure � Build up as much as you can; stitch together the pieces COLING 2004: XLE tutorial COLING 2004: XLE tutorial Fragment Chunks: Sample output Fragment Chunks: Sample output F-structure F-structure � the the dog appears. � Split into: – "token" the – sentence " the dog appears " – ignore the period COLING 2004: XLE tutorial COLING 2004: XLE tutorial

Recommend


More recommend