A Corpus and Model Integrating Multiword Expressions and Supersenses Nathan Schneider Noah Smith NAACL-HLT • June 3, 2015, Denver
Given a sentence fi nd & categorize minimal units of meaning cheaply, with broad coverage 2
Noam_Chomsky refused to give_in_to the vicious daddy_longlegs . 3
Noam_Chomsky refused to give_in_to the vicious daddy_longlegs . 4
Noam_Chomsky refused to give_in_to the vicious daddy_longlegs . 5
Noam_Chomsky refused to give_in_to the vicious daddy_longlegs . 6
Lexical segmentation Noam_Chomsky refused Alan_Black refused to give_in_to the to vicious daddy_longlegs . give_in_to multiword expressions the vicious daddy_longlegs . 7
give_in_to d a d d y _ l o n g l e g s • sdfsf y k s m o h C _ m a o N 8 Jonathan Huang
Supersense tagging Noam_Chomsky N : PERSON refused V : COGNITION to – give_in_to V : SOCIAL the – vicious – daddy_longlegs N : ANIMAL . – 9
Outline • Background multiword expressions ‣ supersenses ‣ • Dataset • Joint model • Results 10
Definition (Baldwin & Kim, 2010; Schneider et al., LREC 2014) • Multiword expression (MWE): 2 or more orthographic words/lexemes that function together as an idiomatic whole • idiomatic = not fully predictable in form, function, and/or frequency unusual morphosyntax : Me/*Him neither ; ‣ by and large ; plural of daddy longlegs ? non- or semi-compositional : ‣ ice cream , daddy longlegs , pay attention statistically collocated : ‣ p ( highly unlikely ) > p ( strongly unlikely ) 11
Definition (Baldwin & Kim, 2010; Schneider et al., LREC 2014) • Multiword expression (MWE): 2 or more orthographic words/lexemes that function together as an idiomatic whole • idiomatic = not fully predictable in form, function, and/or frequency unusual morphosyntax : Me/*Him neither ; ‣ S P by and large ; plural of daddy longlegs ? E C I A non- or semi-compositional : ‣ L L L Y E ice cream , daddy longlegs , pay attention A R N statistically collocated : ‣ E D p ( highly unlikely ) > p ( strongly unlikely ) 12
Noam Chomsky daddy longlegs, hot dog dry dry out out the clothes depend on, come across no attention was paid (to) pay attention (to) pay attention (to) close put up with, give in (to) under the weather cut and dry in spite of pick up where pick up where __ left off they __ left off easy as pie You’re welcome. To each his own. The structure of this paper is as follows. 13
The CMWE Corpus (Schneider et al., LREC 2014) • The entire REVIEWS subsection of the English Web Treebank (Bies et al. 2012), comprehensively annotated for MWEs 723 reviews ‣ 3,800 sentences ‣ 55,000 words ‣ found 3,500 MWE instances ‣ 57% of all sentences (72% >10 words) ‣ contain an MWE 14
CMWE Example (Schneider et al., LREC 2014) They gave me the run around and missing paperwork only to call back to tell me someone else wanted her and I would need to come in and put down a deposit . 15
CMWE Example (Schneider et al., LREC 2014) They gave_ me _the_run_around and missing paperwork only to call_back to tell me someone else wanted her and I would need to come_in and put_down a deposit . Simplified a bit for presentational purposes (we also made a strong/weak distinction) 16
– I V : STATIVE ’m – in – the N : LOCATION green_room supersenses V : COGNITION getting_ready – for – my N : GROUP panel – #textworld 17
NATURAL OBJECT PROCESS BODY ARTIFACT PHENOMENON CHANGE sewer LOCATION SHAPE COGNITION PERSON POSSESSION COMMUNICATION GROUP FOOD COMPETITION SUBSTANCE BODY CONSUMPTION TIME PLANT CONTACT RELATION ANIMAL CREATION QUANTITY OTHER EMOTION FEELING MOTION ! MOTIVE PERCEPTION ! COMMUNICATION POSSESSION COGNITION SOCIAL STATE STATIVE ATTRIBUTE WEATHER ACT EVENT noun verb 18
Supersenses • Semantic classes originally defined by WordNet • Can be inferred from WordNet annotations in SemCor (Miller et al. 1993) • …or annotated directly (Schneider et al. 2012: Arabic Wikipedia; this work ) also Johannsen et al. 2014: English Twitter ‣ • automatic tagging (Ciaramita & Altun 2006; Paaß & Reichartz 2009; Schneider et al. 2013; Johannsen et al. 2014) 19
Outline ✓ Background multiword expressions ‣ supersenses ‣ • Dataset • Joint model • Results 20
STREUSLE Corpus Supersense Tagged Repository of English with a Unified Semantics for Lexical Expressions 21
STREUSLE Corpus • Annotated with comprehensive MWEs ‣ noun+verb supersenses ‣ 22
– I V : COMMUNICATION googled N : GROUP restaurants N : LOCATION in the area N : GROUP and Fuji_Sushi V : COMMUNICATION came_up N : COMMUNICATION and reviews V : COMMUNICATION were great so I made_ N : POSSESSION a carry_out _order 23
– I V : COMMUNICATION googled N : GROUP restaurants N : LOCATION in the area N : GROUP and Fuji_Sushi V : COMMUNICATION came_up N : COMMUNICATION and reviews V : COMMUNICATION were great so I made_ N : POSSESSION a carry_out _order 24
– I V : COMMUNICATION googled N : GROUP restaurants N : LOCATION in the area N : GROUP and Fuji_Sushi V : COMMUNICATION came_up N : COMMUNICATION and reviews V : COMMUNICATION were great so I made_ N : POSSESSION a carry_out _order 25
– I V : COMMUNICATION googled N : GROUP restaurants N : LOCATION in the area N : GROUP and Fuji_Sushi V : COMMUNICATION came_up N : COMMUNICATION and reviews V : COMMUNICATION were great so I made_ N : POSSESSION a carry_out _order 26
– I V : COMMUNICATION googled N : GROUP restaurants N : LOCATION in the area N : GROUP and Fuji_Sushi V : COMMUNICATION came_up N : COMMUNICATION and reviews V : COMMUNICATION were great so I made_ N : POSSESSION a carry_out _order 27
– I V : COMMUNICATION googled N : GROUP restaurants N : LOCATION in the area N : GROUP and Fuji_Sushi V : COMMUNICATION came_up N : COMMUNICATION and reviews V : COMMUNICATION were great so I made_ N : POSSESSION a carry_out _order 28
STREUSLE Annotation • Starting point: CMWE corpus • 2 main phases: noun supersenses ‣ verb supersenses ‣ • Some sentences were reserved for combined noun+verb annotation 29
STREUSLE Annotation • Preexisting conventions for noun supersenses that were applied to Arabic Wikipedia (Schneider et al., 2012) • This work: New conventions for verb supersenses 30
31
Recommend
More recommend