overview esta es una naranja atrac1va adventures in
play

Overview Esta es una naranja atrac1va: Adventures in Adap1ng an - PDF document

4/22/19 Overview Esta es una naranja atrac1va: Adventures in Adap1ng an English Research Goal Introduc1on to Grounded Language Language Grounding System to Prior Works Non-English Data Methods Results By: Caroline Kery


  1. 4/22/19 Overview Esta es una naranja atrac1va: Adventures in Adap1ng an English • Research Goal • Introduc1on to Grounded Language Language Grounding System to • Prior Works Non-English Data • Methods • Results By: Caroline Kery CommiGee: • Conclusion and Future Work Dr. Cynthia Matuszek Dr. Frank Ferraro Dr. Timothy Oates 1 2 Research Goal What is Grounded Language Acquisi1on? Take a grounded language acquisi1on system • Tying language to the real world and adapt it to non-English data What does “cat” mean? hGp://www.petmd.com/sites/default/files/what-does-it-mean-when-cat-wags-tail.jpg hGps://images.immediate.co.uk/vola1le/sites/4/2018/08/iStock_000044061370_Medium-fa5f8aa.jpg? 3 4 quality=45&crop=5px,17px,929px,400px&resize=960,413 hGp://www.royalcanin.ca/~/media/Royal-Canin-Canada/Product-Categories/cat-adult-landing-hero.ashx hGps://www.akc.org/wp-content/themes/akc/component-library/assets/img/welcome.jpg Why is it important? The English-centric Problem • Robots can learn from users • A common problem in Natural Language Processing (NLP), systems are oien designed • Adaptable to new situa1ons with English in mind • Lots of materials available for English systems, not as much for others 5 6 1

  2. 4/22/19 The English-Centric Problem Related works • Robo$c assistants should be accessible to • Grounded language – Grounding ac1ons (e.g. Kollar et al.), direc1ons (e.g. Matuszek et al. non-English-speakers! 2012), some1mes mul1lingual (e.g. Chen et al. 2010) • Computer Vision – Object recogni1on (e.g. Bo et al. 2011), image cap1oning (e.g. Gella et al. 2017) • Mul1lingual Natural Language Processing – Machine transla1on (e.g. Wu et al. 2016), system adapta1ons (e.g. Poesio et al. 2010) 7 8 The Grounded Language System My Research Goal (Pillai et al. RSS 2016) Take a grounded language acquisi1on system and adapt it to non-English data 9 10 Methods Methods • Analysis with Spanish and Hindi • Analysis with Spanish and Hindi – Started with Google Translate data An Indo-Iranian language • Iden1fied adapta1ons (primarily preprocessing) – Collected new crowd-sourced descrip1ons A Romance Language • Analyzed differences across languages with real data Map from the Washington Post Website: hGps://www.washingtonpost.com/ pbox.php?url=hGp:// www.washingtonpost.com/blogs/ worldviews/files/2015/04/Screen- Shot-2015-04-23-at-9.04.22- AM.png&w=1484&op=resize&opt=1&filter=a 11 12 n1alias&t=20170517 2

  3. 4/22/19 Google Translated Data Google Translated Data • Checked transla1on accuracy: back-transla1on • Overall scores comparable 13 14 Adjec1ve-Noun Agreement Necessary modifica1on: Stemming Hindi Spanish • Lemma1zer: • (Simple) Stemmer: baked -> bake baked -> bak baking -> bake baking -> bak runs -> run runs -> run running -> run running -> runn Lemma$zers are hard to find outside of English 15 16 Impact of Stemming on GT Data Real Data Collec1on • Google translate data is an approxima1on • Doesn’t necessarily reflect real language data 17 18 3

  4. 4/22/19 Real Data Collec1on: Amazon Data Collec1on: Results Mechanical Turk • Around 6,000 descrip1ons were collected for each language • “Give 1 to 2 sentences describing the object” • No sample descrip1ons. 19 20 Results: lots of overlap but also some Data Collec1on: Results variety! • Final counts for Spanish and Hindi were smaller due to problema1c workers 21 22 Analysis: Some proper1es that could Overall Scores impact scores • Token count • Stop words • Nega1ve/Posi1ve Examples 23 24 4

  5. 4/22/19 Token Count Stop words • More tokens used in more specific contexts can raise the • Generic and low IDF (Inverse Document Frequency) overall scores Both General stop word only Low IDF stop word only Scores when problema1c workers who used lots of unrelated terms were not removed from the Hindi dataset 25 26 25 Stop words Stop words: Scores General stop word only Low IDF stop word only Both 27 28 Par1cular Tokens and Par1cular Tokens and posi1ve/nega1ve posi1ve/nega1ve Examples Examples English F1 Spanish F1 stemmed Count Score stemmed Count Score English F1 Spanish F1 cabbag 237 0.9297 col 28 0.8352 stemmed Count Score stemmed Count Score cabbag - - repoll 113 0.8294 yellow 562 0.8449 amarill 648 0.933 29 30 5

Recommend


More recommend