Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan - PowerPoint PPT Presentation

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan Schneider Behrang Mohit Chris Dyer Kemal Oflazer Noah A. Smith 1

Gameplan Supersense(Tagging Baselines MT0in0the0Middle Analysis Outlook 3

Supersense(Tagging • A coarse form of word sense disambiguation (partitioning of WordNet synsets) • Generalizes NER beyond proper names; 26 noun categories (Ciaramita & Johnson 2003) SOCIAL Pierre Vinken , 61 years old , will join the board as a nonexecutive director N PERSON TIME GROUP PERSON • Categories broadly applicable across domains • Scheme suitable for direct annotation (Schneider et al. 2012) 4

Supersense(Tagging • English resources WordNet (Fellbaum 1998) ‣ Tagger trained on English SemCor ‣ (Ciaramita & Altun 2006) 77% F 1 in-domain • Arabic resources Arabic WordNet (El Kateb et al. 2006) ‣ Named entities in OntoNotes (Hovy et al. 2006) ‣ Supersense-tagged Wikipedia corpus ‣ (Schneider et al. 2012) 65k words—1/6 the size of SemCor 5

Baselines • Heuristic matching of • Unsupervised sequence Arabic WordNet entries model + OntoNotes NEs ‣ feature-rich (Berg- ‣ only covers 33% of Kirkpatrick et al. 2010) nouns in our corpus P R F 1 P R F 1 Ann-A 32 16 21.6 Ann-A 20 16 17.5 Ann-B 29 15 19.4 Ann-B 14 10 11.6 [evaluating on Arabic Wikipedia test set— 18 articles, 40k words] 6

MT0in0the0Middle (cf. Zitouni & Florian 2008; Rahman & Ng 2012) ( تﺎﻧوﺮﺘﻜﻟﻹا ) ﺔﺒﻟﺎﺴﻟا تﺎﻨﺤﺸﻟا ﻦﻣ ﺔﺑﺎﺤﺳ ﻦﻣ ةرﺬﻟا نﻮﻜﺘﺗ . ﻂﺳﻮﻟا ﻲﻓ اﺪﺟ ةﺮﻴﻐﺻ ﺔﻨﺤﺸﻟا ﺔﺒﺟﻮﻣ ةاﻮﻧ لﻮﺣ مﻮﲢ c d e c GWord NIST 2012 7

MT0in0the0Middle The(corn(is(composed(of(negative(shipments(((electronics()( PLANT ARTIFACT COGNITION cloud(hovering(over(the(nucleus(of(a(very(small(positive( BODY shipment(in(the(center(. ARTIFACT LOCATION 8

MT0in0the0Middle COGNITION ARTIFACT PLANT The(corn(is(composed(of(negative(shipments(((electronics()( cloud(hovering(over(the(nucleus(of(a(very(small(positive( BODY shipment(in(the(center(. ARTIFACT LOCATION 8

MT0in0the0Middle • Heuristic lexicon • MT-in-the-Middle: • matching: P R F 1 P R F 1 Ann-A 37 31 33.8 Ann-A 32 16 21.6 Ann-B 38 32 34.6 Ann-B 29 15 19.4 9

MT0in0the0Middle • MT-in-the-Middle: • Hybrid: P R F 1 P R F 1 Ann-A 37 31 33.8 Ann-A 35 36 35.5 Ann-B 38 32 34.6 Ann-B 36 36 36.0 9

Analysis • Pipeline has many places for noise: MT, English supersense tagging, and projection • We focus on the impact of translation 10

Analysis • Compare cdec vs. an o ff -the-shelf Arabic- English system from QCRI • Translation quality: BLEU METEOR TER QCRI 32.86 32.10 0.46 cdec 28.84 31.38 0.49 • ...but for MTiTM supersense tagging, cdec is consistently better (by 2–4 points). Why? 11

Analysis • Observation: overall MT scores do not necessarily measure preservation of coarse lexical semantics ‣ We really care about (rough) semantic adequacy for noun phrases ‣ We elicited lexical translation acceptability judgments for a sample of sentences (cf. Carpuat 2013: SSSST) 12

Analysis • Lexical acceptability rates: 91.9% for QCRI , 90.0% for cdec • Example errors corn , maize for atom ‣ shipments for charges ‣ electronics for electrons ‣ transliteration: IMAX for EMACS , ‣ genoa lynx for GNU Linux 13

Analysis • So lexical translation is mostly OK, and QCRI does slightly better at it • cdec ’s strength: providing better input to projection ‣ It produces word alignments, whereas QCRI gives phrase alignments 14

Outlook • Supersense tagging can be accomplished (noisily) for a language so long as it can be automatically translated to English • Further gains should come from: better MT—lexical translations and word ‣ alignments better English supersense tagging ‣ better lexicon & corpus resources ‣ 15

Thanks • Francisco Guzman & Preslav Nakov @ QCRI • Wajdi Zaghouani • Waleed Ammar • QNRF • All of you for listening! 16

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan - PowerPoint PPT Presentation

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan Schneider Behrang Mohit Chris Dyer Kemal Oflazer Noah A. Smith 1 Gameplan Supersense(Tagging Baselines MT0in0the0Middle Analysis Outlook 3 Supersense(Tagging A

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Overview and Progress ICANN Singapore Meeting Task Force on Arabic Script IDNs (TF-AIDN) Middle

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

Part-of-speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic

Comprehensive Supersense Disambiguation of English Prepositions and Possessives Nathan

Match Box Meet-in-the-Middle Attack against KATAN Thomas Fuhr and Brice Minaud ANSSI, France

A Meet-in-the-Middle Attack on 8-Round AES H useyin Demirci Ali Aydn Sel cuk presented

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

Diffie-Hellman not secure against Man-in-the-Middle-attack: Want to guarantee authenticity. Alice

Programming the Demirci-Selc uk Meet-in-the-Middle Attack with Constraints Danping Shi 1 Siwei

Programming the Demirci-Selc uk Meet-in-the-Middle Attack with Constraints Danping Shi 1 Siwei

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

Diffie-Hellman not secure against Man-in-the-Middle-attack: Alice Mallory Bob g a a g

Arabic Language Challenges Walid Magdy This lecture is not About Arabic language technologies

Man-in-the-Middle attacks revisited Hugo Jonker, Rolando Trujillo, Sjouke Mauw Man-in-the-middle

Hanady Ahmed Allan Ramsay Arabic Department, CAS

Meet in the Middle Attack Using Output Truncation in 3 Pass HAVAL Yu Sasaki NTT

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Aspectual object marking in Libyan Arabic Kersti Brjars, Khawla Ghadgoud & John Payne The

Lecture Overview Web 2.0, Tagging, Multimedia, Introduction to Web 2.0 Overview of

Study on the Project-Based Learning in Arabic Classes Kawakib Usman, Dept. E Hamid

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan - PowerPoint PPT Presentation

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan Schneider Behrang Mohit Chris Dyer Kemal Oflazer Noah A. Smith 1 Gameplan Supersense(Tagging Baselines MT0in0the0Middle Analysis Outlook 3 Supersense(Tagging A

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Overview and Progress ICANN Singapore Meeting Task Force on Arabic Script IDNs (TF-AIDN) Middle

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

Part-of-speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic

Comprehensive Supersense Disambiguation of English Prepositions and Possessives Nathan

Match Box Meet-in-the-Middle Attack against KATAN Thomas Fuhr and Brice Minaud ANSSI, France

A Meet-in-the-Middle Attack on 8-Round AES H useyin Demirci Ali Aydn Sel cuk presented

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

Diffie-Hellman not secure against Man-in-the-Middle-attack: Want to guarantee authenticity. Alice

Programming the Demirci-Selc uk Meet-in-the-Middle Attack with Constraints Danping Shi 1 Siwei

Programming the Demirci-Selc uk Meet-in-the-Middle Attack with Constraints Danping Shi 1 Siwei

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

Diffie-Hellman not secure against Man-in-the-Middle-attack: Alice Mallory Bob g a a g

Arabic Language Challenges Walid Magdy This lecture is not About Arabic language technologies

Man-in-the-Middle attacks revisited Hugo Jonker, Rolando Trujillo, Sjouke Mauw Man-in-the-middle

Hanady Ahmed Allan Ramsay Arabic Department, CAS

Meet in the Middle Attack Using Output Truncation in 3 Pass HAVAL Yu Sasaki NTT

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Aspectual object marking in Libyan Arabic Kersti Brjars, Khawla Ghadgoud &amp; John Payne The

Lecture Overview Web 2.0, Tagging, Multimedia, Introduction to Web 2.0 Overview of

Study on the Project-Based Learning in Arabic Classes Kawakib Usman, Dept. E Hamid

Aspectual object marking in Libyan Arabic Kersti Brjars, Khawla Ghadgoud & John Payne The