Improving Domain Independent Question Parsing with Synthetic - PowerPoint PPT Presentation

Improving Domain Independent Question Parsing with Synthetic Treebanks COLING 2018: LAW-MWE-CxG Halim-Antoine Boukaram , Nizar Habash, † Micheline Ziadee, and Majd Sakr ‡ American University of Science and Technology, Lebanon † New York University Abu Dhabi, UAE ‡ Carnegie Mellon University, USA {hboukaram,mziadee}@aust.edu.lb, nizar.habash@nyu.edu, msakr@cs.cmu.edu

Problem & Solution ● Automatic parsers do not perform well on question constructions ○ Most treebanks used for training are in the news domain which lacks question constructions ● Our proposed solution is to synthetically create syntactic trees of questions on which to train parsers ● We present our results on Standard Arabic, a morphologically rich and relatively low-resource language 1

Example of Question Parsing Errors To where do I go to submit the application? ؟ بلطلا مداق هجا نًا ىلا Automatically Parsed Human Parsed 2

Example of Question Parsing Errors What time will the celebration start? ؟ لافتاقا اهيف أدبيس ةعاس يأ Automatically Parsed Human Parsed 3

Research Questions ● We explore two effective and low-cost techniques to add more annotated questions to the training corpus ○ Automatically Generating Questions from Existing Treebanks ○ Automatically Generating Questions from Question Templates ● Research questions: ○ How do these techniques compare with manual annotation of additional questions? ○ Do combinations of synthetic and manual data improve accuracy? 4

Technique #1: QGen ● Automatically transform an annotated sentence into a number of annotated questions (4.75 on average) ○ (S (NP-SPJ the boy) (VP ate (NP-OBJ the apple))) ○ → (SBARQ (WHNP who) (S (VP ate (NP-OBJ the apple)))) ○ → (SQ (VP did) (NP-SPJ the boy) (VP eat (NP-OBJ the apple))) 5

Technique #1: QGen ● Words of the input tree are modified morphologically depending on the type of generated question ○ Arabic Who questions ■ Sentences with gender- and number-specific verbs → Questions with masculine-singular verbs 6

QGen Examples Original phrase structure Simple SQ Structure 7

QGen Examples Original phrase structure Simple SBARQ Structure 8

QGen Examples Original phrase structure Modified SQ Structure 9

QGen Examples Original phrase structure Modified SBARQ Structure 10

QGen Examples Original phrase structure Modified SBARQ Structure (who) 11

Limitations of QGen ● Errors in the resulting synthetic data due to overgeneration resulting in nonsensical synthetic questions ● Limited coverage of modeled question structures ● Input domain might be different from desired question domain 12

Technique #2: QTemp ● Generate question templates ● Fill the template by filling the in a desired domain placeholder elements ○ Where is %place%? ○ Where is the bathroom? ● Annotate question templates ○ Where is a bathroom? ○ Where is the dean’s office? ○ Where is the finance office? ○ Where is ...? 13

QTemp Examples Annotated question template Annotated token Annotated question + = 14

QTemp Examples Annotated question template Annotated token Annotated question + = 15

Experimental Setup ● Baseline Treebank is Penn Arabic Treebank (PATB) ● Two Synthetic Treebanks ○ QGen and QTemp ● Two Manually Annotated Treebanks ○ TalkShow and Chatbot ● Test accuracy of parser trained using: ○ Synthetic vs Manual ○ Combined vs Synthetic or Manual 16

Data Sets Treebank Domain Train # Sentences (# Words) Test # Sentences (# Words) PATB (part3) News articles 10,836 (320,998) 794 (12,884) PATBQ News articles N/A 67 (1,054) TalkShow Political talk show 544 (2,691) 143 (692) Chatbot Conversational 239 (1,505) 62 (441) QGEN PATB News articles (Synthetic) 962 (8,140) N/A QTemp Conversational (Synthetic) 1,607 (13,099) N/A 17

Results Corpus Baseline +Synthetic Manual All PATB Train QGEN PATB + QTemp TalkShow + Chatbot PATB 80.6 80.6 80.6 80.9 PATBQ 73.8 74.0 74.9 75.9 Test TalkShow 88.2 87.3 91.4 92.9 Chatbot 90.5 93.6 93.3 94.1 Macro Average Q 84.2 84.9 86.5 87.6 18

Conclusions and Future Work ● Synthetic question treebanks are useful for improving question parsing ● The domain of the synthetic treebanks must match the desired domain of questions we are interested in parsing ● We will investigate how applicable the synthetic techniques are to other languages ● We will write more question generating procedures ● The Manual and Synthetic Treebanks will be published through the Linguistic Data Consortium 23

Thank You

Improving Domain Independent Question Parsing with Synthetic - PowerPoint PPT Presentation

Improving Domain Independent Question Parsing with Synthetic Treebanks COLING 2018: LAW-MWE-CxG Halim-Antoine Boukaram , Nizar Habash, Micheline Ziadee, and Majd Sakr American University of Science and Technology, Lebanon New York

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

Programming Languages: Parsing Onur Tolga S ehito glu Computer Engineering,METU 27 May

* 07/16/96 Plan for Today Shift-reduce parsing The problem with predictive top down parsing

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

Dialogue Quality and Nugget Detection for Short Text Conversation (STC-3) based on Hierarchical

Verification of Deep Learning Systems Xiaowei Huang, University of Liverpool December 25, 2017

Ava: From data to insights through conversations A review by Apaar Shanker DATA ANALYTICS

SMART HOME OVER IRC HAVING A CHAT WITH YOUR TOASTER 1 Motivation In this Lab you will

C hatti ng w i th I n C hat C hatbots f or Resear ch Purposes I n C hat H ands O n C

Neural Conversational Models Human: What is the purpose of living? Machine: To live forever.

Towards the clouds, together The IaaS framework Andres Steijaert SURFnet GANT cloud activity

Data Security And Privacy Of Chatbots @electrobabe Background 27.2.19 sec4dev 27.2.19